Home > Uncategorized > Weekly QuEST Discussion Topics and News, 25 Mar

Weekly QuEST Discussion Topics and News, 25 Mar

Dynamic Memory Network, out of MetaMind will be discussed.  Although we started this discussion two weeks ago – the importance of their effort warrants a more in depth consideration for its implications to QuEST.



Taking Baby Steps Toward Software That Reasons Like Humans



Richard Socher, founder and chief executive of MetaMind, a start-up developing artificial intelligence software. Credit Jim Wilson/The New York Times

Richard Socher appeared nervous as he waited for his artificial intelligence program to answer a simple question: “Is the tennis player wearing a cap?”

The word “processing” lingered on his laptop’s display for what felt like an eternity. Then the program offered the answer a human might have given instantly: “Yes.”

Mr. Socher, who clenched his fist to celebrate his small victory, is the founder of one of a torrent of Silicon Valley start-ups intent on pushing variations of a new generation of pattern recognition software, which, when combined with increasingly vast sets of data, is revitalizing the field of artificial intelligence.

His company MetaMind, which is in crowded offices just off the Stanford University campus in Palo Alto, Calif., was founded in 2014 with $8 million in financial backing from Marc Benioff, chief executive of the business software company Salesforce, and the venture capitalist Vinod Khosla.

MetaMind is now focusing on one of the most daunting challenges facing A.I. software.Computers are already on their way to identifying objects in digital images or converting sounds uttered by human voices into natural language. But the field of artificial intelligence has largely stumbled in giving computers the ability to reason in ways that mimic human thought.

Now a variety of machine intelligence software approaches known as “deep learning” or “deep neural nets” are taking baby steps toward solving problems like a human.

On Sunday, MetaMind published a paper describing advances its researchers have made in creating software capable of answering questions about the contents of both textual documents and digital images.

The new research is intriguing because it indicates that steady progress is being made toward “conversational” agents that can interact with humans. The MetaMind results also underscore how far researchers have to go to match human capabilities.

Other groups have previously made progress on discrete problems, but generalized systems that approach human levels of understanding and reasoning have not been developed.

Five years ago, IBM’s Watson system demonstrated that it was possible to outperform humans on “Jeopardy!”

Last year, Microsoft developed a “chatbot” program known as Xiaoice (pronouncedShao-ice) that is designed to engage humans in extended conversation on a diverse set of general topics.

To add to Xiaoice’s ability to offer realistic replies, the company developed a huge library of human question-and-answer interactions mined from social media sites in China. This made it possible for the program to respond convincingly to typed questions or statements from users.

In 2014, computer scientists at Google, Stanford and other research groups made significant advances in what is described as “scene understanding,” the ability tounderstand and describe a scene or picture in natural language, by combining the output of different types of deep neural net programs.

These programs were trained on images that humans had previously described. The approach made it possible for the software to examine a new image and describe it with a natural-language sentence.

While even machine vision is not yet a solved problem, steady, if incremental, progress continues to be made by start-ups like Mr. Socher’s; giant technology companies such as Facebook, Microsoft and Google; and dozens of research groups.

In their recent paper, the MetaMind researchers argue that the company’s approach, known as a dynamic memory network, holds out the possibility of simultaneously processing inputs including sound, sight and text. ** fusion **

The design of MetaMind software is evidence that neural network software technologies are becoming more sophisticated, in this case by adding the ability both to remember a sequence of statements and to focus on portions of an image. For example, a question like “What is the pattern on the cat’s fur on its tail?” might yield the answer “stripes” and show that the program had focused only on the cat’s tail to arrive at its answer.

“Another step toward really understanding images is, are you actually able to answer questions that have a right or wrong answer?” Mr. Socher said.

MetaMind is using the technology for commercial applications like automated customer support, he said. For example, insurance companies have asked if the MetaMind technology could respond to an email with an attached photo — perhaps of damage to a car or other property — he said.

There are two papers that we will use for the technical detail:

Ask Me Anything: Dynamic Memory Networks
for Natural Language Processing:

  • Most tasks in natural language processing can be cast into question answering (QA) problems over language input.  ** way we cast QuEST  Query response**
  • We introduce the dynamic memory network (DMN), a unified neural network frameworkwhich processes input sequences and questions, forms semantic and episodic memories, andgenerates relevant answers.
  • The DMN can be trained end-to-end and obtains state of the art results on several types of tasks and datasets:
  • question answering (Facebook’s bAbI dataset),
  • sequence modeling for part of speech tagging (WSJ-PTB),
  • and text classification for sentiment analysis (Stanford Sentiment Treebank).
  • The model relies exclusively on trained word vector representations and requires no string matching or manually engineered features.


The second paper:

Dynamic Memory Networks for Visual and Textual Question Answering
Xiong, Merity, Socher – arXiv:1603.01417v1 [cs.NE] 4 Mar 2016

  • Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering.
  • One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks.

–     However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images.

–     Based on an analysis of the DMN, we propose several improvements to its memory and input modules.

–     Together with these changes we introduce a novel input module for images in order to be able to answer visual questions.

–     Our new DMN+ model improves the state of the art on both the

  • Visual Question Answering dataset and
  • the bAbI-10k text question-answering dataset without supporting fact supervision.

news summary (5)



Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: