We all know deep learning isn’t very deep.  Over the decades, through AI spring and winter, teams have tried to deliver higher order reasoning to give human like intelligence to machines.  Now researchers have come up with a new dataset to show researchers just how no clever their current models are and to give them something to test against when designing smarter ones.

“Known as CLEVRER, the data set consists of 20,000 short synthetic video clips and more than 300,000 question and answer pairings that reason about the events in the videos. Each video shows a simple world of toy objects that collide with one another following simulated physics. In one, a red rubber ball hits a blue rubber cylinder, which continues on to hit a metal cylinder.

Created by Harvard, DeepMind, and MIT-IBM Watson AI Lab reseachers, it’s mean to help evaluate just how well newly designed AI system can reason in the real world. The team tested state of the art machine learning models against the new dataset and found most of them failed badly.

The team then tried an approach that’s been a long time coming, mixing old school symbolic reasoning with deep learning.

Find out what they discovered in this article from MIT Tech Review.