Archive for March, 2016

Weekly QuEST Discussion Topics and News, 1 Apr

March 31, 2016 Leave a comment

QuEST April 1, 2016

The topic this week is a discussion about the unexpected query – specifically ‘zero-shot learning’.  We will use an article by Socher / Manning / Ng from NIPS 2013:

Zero-Shot Learning Through Cross-Modal Transfer
Richard Socher, Milind Ganjoo, Christopher D. Manning, Andrew Y. Ng

Zero-Shot Learning Through Cross-Modal Transfer
Richard Socher, Milind Ganjoo, Christopher D. Manning, Andrew Y. Ng

  • This work introduces a model that can recognize objects in images even if no training data is available for the object class.
  • The only necessary knowledge about unseen visual categories comes from unsupervised text corpora.

Related to question of the unexpected query – but unexpected with respect to the image classification system – not to the word / text processing system – so a sort of  transfer learning issue – transfer between systems

  • Unlike previous zero-shot learning models, which can only differentiate between unseen classes, our model can operate on a mixture of seen and unseen classes, simultaneously obtaining state of the art performance on classes with thousands of training images and reasonable performance on unseen classes.
  • This is achieved by seeing the distributions of words in texts as a semantic space for understanding what objects look like.
  • Our deep learning model does not require any manually defined semantic or visual featuresfor either words or images.
  • Images are mapped to be close to semantic word vectors corresponding to their classes, and the resulting image embeddings can be used to distinguish whether an image is of a seen or unseen class.
  • We then use novelty detection methods to differentiate unseen classes from seen classes.
  • We demonstrate two novelty detection strategies;
  • the first gives high accuracy on unseen classes,
  • while the second is conservative in its prediction of novelty and keeps the seen classes’ accuracy high.

news summary (6)

Categories: Uncategorized

Weekly QuEST Discussion Topics and News, 25 Mar

March 24, 2016 Leave a comment

Dynamic Memory Network, out of MetaMind will be discussed.  Although we started this discussion two weeks ago – the importance of their effort warrants a more in depth consideration for its implications to QuEST.

Taking Baby Steps Toward Software That Reasons Like Humans



Richard Socher, founder and chief executive of MetaMind, a start-up developing artificial intelligence software. Credit Jim Wilson/The New York Times

Richard Socher appeared nervous as he waited for his artificial intelligence program to answer a simple question: “Is the tennis player wearing a cap?”

The word “processing” lingered on his laptop’s display for what felt like an eternity. Then the program offered the answer a human might have given instantly: “Yes.”

Mr. Socher, who clenched his fist to celebrate his small victory, is the founder of one of a torrent of Silicon Valley start-ups intent on pushing variations of a new generation of pattern recognition software, which, when combined with increasingly vast sets of data, is revitalizing the field of artificial intelligence.

His company MetaMind, which is in crowded offices just off the Stanford University campus in Palo Alto, Calif., was founded in 2014 with $8 million in financial backing from Marc Benioff, chief executive of the business software company Salesforce, and the venture capitalist Vinod Khosla.

MetaMind is now focusing on one of the most daunting challenges facing A.I. software.Computers are already on their way to identifying objects in digital images or converting sounds uttered by human voices into natural language. But the field of artificial intelligence has largely stumbled in giving computers the ability to reason in ways that mimic human thought.

Now a variety of machine intelligence software approaches known as “deep learning” or “deep neural nets” are taking baby steps toward solving problems like a human.

On Sunday, MetaMind published a paper describing advances its researchers have made in creating software capable of answering questions about the contents of both textual documents and digital images.

The new research is intriguing because it indicates that steady progress is being made toward “conversational” agents that can interact with humans. The MetaMind results also underscore how far researchers have to go to match human capabilities.

Other groups have previously made progress on discrete problems, but generalized systems that approach human levels of understanding and reasoning have not been developed.

Five years ago, IBM’s Watson system demonstrated that it was possible to outperform humans on “Jeopardy!”

Last year, Microsoft developed a “chatbot” program known as Xiaoice (pronouncedShao-ice) that is designed to engage humans in extended conversation on a diverse set of general topics.

To add to Xiaoice’s ability to offer realistic replies, the company developed a huge library of human question-and-answer interactions mined from social media sites in China. This made it possible for the program to respond convincingly to typed questions or statements from users.

In 2014, computer scientists at Google, Stanford and other research groups made significant advances in what is described as “scene understanding,” the ability tounderstand and describe a scene or picture in natural language, by combining the output of different types of deep neural net programs.

These programs were trained on images that humans had previously described. The approach made it possible for the software to examine a new image and describe it with a natural-language sentence.

While even machine vision is not yet a solved problem, steady, if incremental, progress continues to be made by start-ups like Mr. Socher’s; giant technology companies such as Facebook, Microsoft and Google; and dozens of research groups.

In their recent paper, the MetaMind researchers argue that the company’s approach, known as a dynamic memory network, holds out the possibility of simultaneously processing inputs including sound, sight and text. ** fusion **

The design of MetaMind software is evidence that neural network software technologies are becoming more sophisticated, in this case by adding the ability both to remember a sequence of statements and to focus on portions of an image. For example, a question like “What is the pattern on the cat’s fur on its tail?” might yield the answer “stripes” and show that the program had focused only on the cat’s tail to arrive at its answer.

“Another step toward really understanding images is, are you actually able to answer questions that have a right or wrong answer?” Mr. Socher said.

MetaMind is using the technology for commercial applications like automated customer support, he said. For example, insurance companies have asked if the MetaMind technology could respond to an email with an attached photo — perhaps of damage to a car or other property — he said.

There are two papers that we will use for the technical detail:

Ask Me Anything: Dynamic Memory Networks
for Natural Language Processing:

  • Most tasks in natural language processing can be cast into question answering (QA) problems over language input.  ** way we cast QuEST  Query response**
  • We introduce the dynamic memory network (DMN), a unified neural network frameworkwhich processes input sequences and questions, forms semantic and episodic memories, andgenerates relevant answers.
  • The DMN can be trained end-to-end and obtains state of the art results on several types of tasks and datasets:
  • question answering (Facebook’s bAbI dataset),
  • sequence modeling for part of speech tagging (WSJ-PTB),
  • and text classification for sentiment analysis (Stanford Sentiment Treebank).
  • The model relies exclusively on trained word vector representations and requires no string matching or manually engineered features.


The second paper:

Dynamic Memory Networks for Visual and Textual Question Answering
Xiong, Merity, Socher – arXiv:1603.01417v1 [cs.NE] 4 Mar 2016

  • Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering.
  • One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks.

–     However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images.

–     Based on an analysis of the DMN, we propose several improvements to its memory and input modules.

–     Together with these changes we introduce a novel input module for images in order to be able to answer visual questions.

–     Our new DMN+ model improves the state of the art on both the

  • Visual Question Answering dataset and
  • the bAbI-10k text question-answering dataset without supporting fact supervision.

news summary (5)



Categories: Uncategorized

No QuEST Meeting this week

March 16, 2016 Leave a comment

Attached are the news stories but due to Cap travelling there will be no
meeting – happy to engage via email anyone who has something they wanted to
discuss – Cap focus this week was on formulating a revision to the DRAW
approach that could be used to generate the artificial conscious
representation for QuEST agent working with our colleague Bernard A. and
working with our remote colleagues Oliver N and Andres R. on a new approach
to deep learning that results in better encoder weights and better semantic
metadata for video from the wild.

news summary (4)

Categories: Uncategorized

Weekly QuEST Discussion Topics and News, 11 Mar

March 10, 2016 Leave a comment

QuEST 11 March 2016:

We again have multiple topics this week:

I want to catch everyone up briefly on where we landed on situation

understanding – great discussion last week and it solidified a position that I want

Meaning is the changes in an agent’s representation resulting from a query.

Understanding is the impact of that meaning on accomplishing a particular task

We can also finish the discussion around issues and defining characteristics of

understanding and relating it to ISR mission capabilities:

 Bloom considerations

 More than generating an acceptable response

 Transfer of knowledge

 Coverage – understanding not necessarily enhanced by more data

 Using knowledge flexibly

We also want to briefly discuss the DRAW (Deep Recurrent Attentive Writer)

(DRAW): A recurrent neural network for image generation. The purpose of this

discussion is to get everyone thinking about how we can generate our ‘artificially

conscious’ representation in QuEST agents. That representation requires a

confabulation that is situation, simulated and structurally coherent. The DRAW

article was recently investigated by our team as an option. We want to discuss

the work and where we found limitations / challenges. DRAW networks combine

a novel spatial attention mechanism that mimics the foveation of the human

eye, with a sequential variational auto-encoding framework that allows for the

iterative construction of complex images. DRAW is just one example of a

generative model that we have been considering – we will discuss others later.

Another topic we’ve been spending time on recently is the Dynamic Memory

Network, out of MetaMind.


news summary (3)

Categories: Uncategorized

Weekly QuEST Discussion Topics and News, 4 Mar

QuEST 4 March 2016


We again have multiple topics this week:

Our colleagues Jared C and OX have spent some time capturing what we would like to achieve with a QuEST Theory of Knowledge (ToK).  We will start this week having a discussion to help us refine the goals of that effort.

QuEST Theory of Knowledge

By a QuEST Theory of Knowledge (QToK), we mean a collection of mathematical models, along with algorithms and theorems that would allow us to begin to answer questions such as:

What can a team of agents know and how can it reason in a given environment?

What knowledge and processing is required for a given task?

Such a theory would enable the matching of (a team of) agents to (a set of) tasks with an expectation of a certain level of performance (e.g., one could provide more training or computer-based decision aides to a human team, tailored for making particular decisions in a specific environment). …

The second topic is on understanding – versus awareness – I am giving a plenary talk in an upcoming conference and the topic is – The QuEST for multi sensor big data isr situation understanding – so I’ve been trying to resolve my issues with the unexpected query as the means to communicate the hole we are attempting to fill and cast the hole as understanding playing off our efforts to define meaning.

I’m now casting the discussion around issues and defining characteristics (currently the 9 below) of understanding and relating it to ISR mission capabilities:



More than generating an acceptable response

Problem of inert ideas

Agent centric

Requires transfer of knowledge

Unexpected queries

Coverage – not attained by just adding more data

Using knowledge flexibly

Updating knowledge

news summary (2)

Categories: Uncategorized