Monthly Archives: October 2016

Reproduces aspects of human neurology

MIT researchers and their colleagues have developed a new computational model of the human brain’s face-recognition mechanism that seems to capture aspects of human neurology that previous models have missed.

The researchers designed a machine-learning system that implemented their model, and they trained it to recognize particular faces by feeding it a battery of sample images. They found that the trained system included an intermediate processing step that represented a face’s degree of rotation — say, 45 degrees from center — but not the direction — left or right.

This property wasn’t built into the system; it emerged spontaneously from the training process. But it duplicates an experimentally observed feature of the primate face-processing mechanism. The researchers consider this an indication that their system and the brain are doing something similar.

“This is not a proof that we understand what’s going on,” says Tomaso Poggio, a professor of brain and cognitive sciences at MIT and director of the Center for Brains, Minds, and Machines (CBMM), a multi-institution research consortium funded by the National Science Foundation and headquartered at MIT. “Models are kind of cartoons of reality, especially in biology. So I would be surprised if things turn out to be this simple. But I think it’s strong evidence that we are on the right track.”

Indeed, the researchers’ new paper includes a mathematical proof that the particular type of machine-learning system they use, which was intended to offer what Poggio calls a “biologically plausible” model of the nervous system, will inevitably yield intermediary representations that are indifferent to angle of rotation.

Poggio, who is also a primary investigator at MIT’s McGovern Institute for Brain Research, is the senior author on a paper describing the new work, which appeared today in the journal Computational Biology. He’s joined on the paper by several other members of both the CBMM and the McGovern Institute: first author Joel Leibo, a researcher at Google DeepMind, who earned his PhD in brain and cognitive sciences from MIT with Poggio as his advisor; Qianli Liao, an MIT graduate student in electrical engineering and computer science; Fabio Anselmi, a postdoc in the IIT@MIT Laboratory for Computational and Statistical Learning, a joint venture of MIT and the Italian Institute of Technology; and Winrich Freiwald, an associate professor at the Rockefeller University.

Fabricate drones with a wide range

This fall’s new Federal Aviation Administration regulations have made drone flight easier than ever for both companies and consumers. But what if the drones out on the market aren’t exactly what you want?

A new system from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) is the first to allow users to design, simulate, and build their own custom drone. Users can change the size, shape, and structure of their drone based on the specific needs they have for payload, cost, flight time, battery usage, and other factors.

To demonstrate, researchers created a range of unusual-looking drones, including a five-rotor “pentacopter” and a rabbit-shaped “bunnycopter” with propellers of different sizes and rotors of different heights.

“This system opens up new possibilities for how drones look and function,” says MIT Professor Wojciech Matusik, who oversaw the project in CSAIL’s Computational Fabrication Group. “It’s no longer a one-size-fits-all approach for people who want to make and use drones for particular purposes.”

The interface lets users design drones with different propellers, rotors, and rods. It also provides guarantees that the drones it fabricates can take off, hover and land — which is no simple task considering the intricate technical trade-offs associated with drone weight, shape, and control.

“For example, adding more rotors generally lets you carry more weight, but you also need to think about how to balance the drone to make sure it doesn’t tip,” says PhD student Tao Du, who was first author on a related paper about the system. “Irregularly-shaped drones are very difficult to stabilize, which means that they require establishing very complex control parameters.”

Du and Matusik co-authored a paper with PhD student Adriana Schulz, postdoc Bo Zhu, and Assistant Professor Bernd Bickel of IST Austria. It will be presented next week at the annual SIGGRAPH Asia conference in Macao, China.

Today’s commercial drones only come in a small range of options, typically with an even number of rotors and upward-facing propellers. But there are many emerging use cases for other kinds of drones. For example, having an odd number of rotors might create a clearer view for a drone’s camera, or allow the drone to carry objects with unusual shapes.

Designing these less conventional drones, however, often requires expertise in multiple disciplines, including control systems, fabrication, and electronics.

Machine learning system

In recent years, computers have gotten remarkably good at recognizing speech and images: Think of the dictation software on most cellphones, or the algorithms that automatically identify people in photos posted to Facebook.

But recognition of natural sounds — such as crowds cheering or waves crashing — has lagged behind. That’s because most automated recognition systems, whether they process audio or visual information, are the result of machine learning, in which computers search for patterns in huge compendia of training data. Usually, the training data has to be first annotated by hand, which is prohibitively expensive for all but the highest-demand applications.

Sound recognition may be catching up, however, thanks to researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). At the Neural Information Processing Systems conference next week, they will present a sound-recognition system that outperforms its predecessors but didn’t require hand-annotated data during training.

Instead, the researchers trained the system on video. First, existing computer vision systems that recognize scenes and objects categorized the images in the video. The new system then found correlations between those visual categories and natural sounds.

“Computer vision has gotten so good that we can transfer it to other domains,” says Carl Vondrick, an MIT graduate student in electrical engineering and computer science and one of the paper’s two first authors. “We’re capitalizing on the natural synchronization between vision and sound. We scale up with tons of unlabeled video to learn to understand sound.”

Fully automated speech recognition

Speech recognition systems, such as those that convert speech to text on cellphones, are generally the result of machine learning. A computer pores through thousands or even millions of audio files and their transcriptions, and learns which acoustic features correspond to which typed words.

But transcribing recordings is costly, time-consuming work, which has limited speech recognition to a small subset of languages spoken in wealthy nations.

At the Neural Information Processing Systems conference this week, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are presenting a new approach to training speech-recognition systems that doesn’t depend on transcription. Instead, their system analyzes correspondences between images and spoken descriptions of those images, as captured in a large collection of audio recordings. The system then learns which acoustic features of the recordings correlate with which image characteristics.

“The goal of this work is to try to get the machine to learn language more like the way humans do,” says Jim Glass, a senior research scientist at CSAIL and a co-author on the paper describing the new system. “The current methods that people use to train up speech recognizers are very supervised. You get an utterance, and you’re told what’s said. And you do this for a large body of data.

“Big advances have been made — Siri, Google — but it’s expensive to get those annotations, and people have thus focused on, really, the major languages of the world. There are 7,000 languages, and I think less than 2 percent have ASR [automatic speech recognition] capability, and probably nothing is going to be done to address the others. So if you’re trying to think about how technology can be beneficial for society at large, it’s interesting to think about what we need to do to change the current situation. And the approach we’ve been taking through the years is looking at what we can learn with less supervision.”