Can You See the Grand Canyon From Space
Standing behind a podium at MIT's Centre for Brains, Minds and Machines, Amnon Shashua began a voice communication that wouldn't have been out of place at the hyperbole-happy TechCrunch Disrupt briefing. In front of an audience of neuroscientists and computer scientists, however, he recognized that a speech titled "Computer Vision That Is Irresolute Our World," which suggests that our future is in large part being congenital past cameras, needed a reality check up front end.
"All smartphones accept cameras," Shashua acknowledged at the event in March. "Simply you cannot state that what the cameras on smartphones do today will change our world.
"I am going to talk about something else related to cameras."
Shashua is one of the world'south leading experts in the field of computer vision—the ability of computers to process what they see in the aforementioned way humans exercise. Nosotros already use early forms of computer vision when our smartphones identify faces and stitch together panoramic photos, but across that, they don't act very much on what they meet.
The cameras Shashua is building, still, are what allow him to cypher forth Israeli highways in a customized self-driving car. In partnerships with Hyundai and Tesla, amongst other auto companies, they empower cars already on American roads to detect what'southward around them, calculate the chances of a collision, and automatically brake if an accident appears inevitable. These smart cameras are the core applied science of Mobileye, the company Shashua cofounded with the ambition of eliminating automobile accidents within xx years.
Because that ane.three million people are killed on the world'southward roads each year—37,000 in the United States alone—Mobileye is an example of what Shashua means when he says computer vision is changing the globe. But seemingly aware that but one example of the technology's abilities, however impressive, may non be enough to convince people of its potential, the homo who endowed self-driving cars with sight turned the focus of his spoken language to those who cannot see.
"In the U.S., in that location are about 25 million visually impaired [people]," he said. "Corrective lenses cannot correct their disability. They cannot read anymore. They cannot negotiate in the outdoors. […] And this segment of gild doesn't have real technology to help them."
Shashua realized that if cameras could run into for cars, they ought to be able to see for people as well. In 2010, he cofounded OrCam with Ziv Aviram, Mobileye's CEO, to develop a smart photographic camera that clips onto glasses and reads the world to the visually impaired, from eating house menus to crosswalk signals.
Only OrCam is merely one of a growing gaggle of companies and researchers pushing the boundaries of artificial intelligence and finding themselves creating an ecosystem of technologies that, among other things, volition before long allow the visually dumb to engage with the world like never before. As calculator vision continues to evolve, cameras can now expect at images and translate them into plainly language descriptions. Soon, 3D mapping technology will guide people through unfamiliar environments like digital guide dogs. Meanwhile, applied science companies industry-wide are in an innovation arms race to imbue smartphones with the processing power they need and then anyone can access all these features by simply reaching into their pockets.
Though most of these technologies are in their beta stages, their continued evolution and eventual convergence seems likely to event in, among other feats, unprecedented independence for the visually impaired.
Back at the Center for Brains, Minds and Machines, Shashua played a clip from a briefing terminal summertime, where Devorah Shaltiel, one of OrCam's users, took the stage. Wearing sunglasses with a camera clipped on by her right temple, she described the experience of eating lunch with a friend.
"Unremarkably when we sit downwards, we would be presented with the menu," she began. "My friend would then read the menu, place her gild, and and so she would read the menu to me. […] This time was very different. I had OrCam with me, and I was able to read the menu myself. I was able to place my lodge. […] I was able to continue my conversation with my friend, without my friend being focused on my disability.
"For the get-go fourth dimension since losing my sight, I was able to feel like a normal person."
We tend to overlook how complex a process it is to see things. Look around you at any moment and you'll see an countless stream of objects, people, and events that you lot may or may not recognize. A coffee-stained mug on your desk. A jogger plodding down the sidewalk. The headlights of a machine in the distance.
Our eyes are regularly bombarded with new and often unfamiliar information. Even so for almost people, sight is a pop quiz that's almost impossible to neglect.
The technologies that will help the visually impaired aren't quite at a human level yet, but through a sophisticated barrage of trial and error, they're catching up. Earlier this yr, when one of today's leading seeing computers was shown an image of a human eating a sandwich on the street, it described the scene equally, "A man is talking on his cellphone every bit another man watches."
The fault, empty-headed as information technology may seem to us, is a major leap forward for technology. Only to become to the point where a computer could confuse a sandwich for a cellphone, the computer had to understand that a cellphone is a handheld object that people typically hold close to their faces. The process of learning those kinds of patterns—whether information technology'south what a cellphone is or what a face looks like—is known in the bogus intelligence customs equally deep machine learning, or deep learning.
Much like teaching a child without any frame of reference, deep learning involves researchers feeding computers massive amounts of data, from which they start to form what we might call understanding. Mobileye'due south cars, for instance, were trained to navigate the intricacies of moving traffic. OrCam was taught to read. IBM'southward Watson has been fed the world's leading cancer research past oncologists at the Memorial Sloan Kettering Cancer Center. Meanwhile companies such as Google, Facebook, and Amazon are exploring how to use deep learning in their quests for hyper-personalized experiences.
Deep learning allows computers to human action independently on learned knowledge and go smarter every bit they run across more of it. In the case of computer vision, it allows them to recognize what they're seeing.
The sandwich-cellphone gaffe was published in a paper past researchers in Canada who displayed images to a computer and asked information technology to tell them what it saw in plain English language or, in AI terms, natural language. They tracked the way the calculator studied an prototype, where information technology looked and when, as information technology translated the image into a clarification in real time. That ability—to coherently describe things as we come across them—has long been a holy grail of sorts for computer vision scientists.
"It amounts to mimicking the remarkable man ability to compress huge amounts of salient visual data into descriptive language," the newspaper'south authors wrote.
The computer accurately described pictures of "A giraffe standing in a woods with copse in the groundwork" and "A group of people sitting on a boat in the h2o." It identified "A little girl is sitting on a bed with a teddy carry," though she was actually on a burrow. It besides mistook a violin for a skateboard.
"There are analogies to be fabricated between what the model is doing and what brains are doing," said Richard Zemel, a computer scientist at the University of Toronto who contributed to the study. But different brains, deep learning computers don't make the aforementioned error twice. They larn, but they never forget.
That superpower is backside a contempo article in Re/code profiling Facebook's, Google's, and the residual of the engineering industry's growing investment in the field. "AI experts suggest that deep learning could before long be the backbone of many tech products that nosotros use every single mean solar day," wrote Mark Bergen and Kurt Wagner.
That vision of the engineering science isn't ready for primetime all the same, and its applications are nevertheless being developed on Silicon Valley inquiry campuses. Just its potential to support the visually impaired is obvious to OrCam's Shashua.
"Assume you are out in that location, you are outdoors," he said in his speech. "You have lost orientation completely. Y'all want the system to tell me what I meet. Every frame, every second, tell me what I see. I run into a tree. I see a chair. I come across people. […] This is something that is at the cut edge of research today. This is something that can be done."
Figurer vision has already conquered text. It's fifty-fifty outperformed humans at identifying images. Only to match human sight, translating the amorphous visual field of life happening in existent time is the adjacent frontier.
In 2014, Google debuted something called Project Tango at its annual I/O briefing in San Francisco. It promised to requite "a mobile device the ability to navigate the concrete world similar to how nosotros practise as humans." Should Google deliver on this promise, it would be a major spring in the evolution of computer vision and a significant stride toward fulfilling Shashua'due south claim that it will change our earth.
Navigating similar humans requires a spatial sensation that, as with vision, we take for granted. When we walk up a staircase in an unfamiliar building, then plough a corner and enter a room for the beginning time, we know how to get back downstairs. Without thinking about information technology, nosotros automatically mapped out the space as we passed through it, and, until recently, computers could do no such thing.
The skill is known as simultaneous localization and mapping, or SLAM, and "Researchers in bogus intelligence have long been fascinated (some would say obsessed) with the problem," wrote Erik Brynjolfsson and Andrew McAfee in their futuristic classic The Second Machine Historic period. In 2010, Microsoft scientists croaky the SLAM code and ushered in a flood of robotics innovation over the last several years.
Project Tango has emerged as a frontrunner in bringing SLAM technology into our 24-hour interval-to-day lives. Through partnerships with Nvidia and Qualcomm, Google researchers have developed tablets and smartphones with sensors that work with a photographic camera to navigate and dispense the infinite effectually them. Using the applied science, drones can now explore the interior of buildings apart, while gamers can transform their rooms into virtual reality forests and their friends into floating heads.
It'southward Time for Smartphones to Think for Themselves
How Qualcomm is bringing
humanlike cognition
to mobile devices
Since Alan Turing'southward 1950 seminal newspaper Computing Machinery and Intelligence asked, "Tin machines recall?" filmmakers have fascinated united states by imagining a world in which they do.
Back in the 1960s, the creators of "The Jetsons" imagined a playful earth where Rosie the Robot cleaned our houses, washed our dishes, and played ball with our kids. More recently, Hollywood dreamed upwardly the Samantha character for the movie "Her"—not a robot, but software created to practice her owner's behest, from managing calendars to providing emotional back up in darker times. Samantha even had the capacity to acquire, adapt, and evolve. She was indistinguishable from a human existence, other than that she happened to be software inside a small-scale, supersmart smartphone.
Concluding year, Bridegroom Cumberbatch reintroduced Turing's famous question to the public with "The Imitation Game." Months later, we wrestled with it again in "Ex Machina," which centers on an interrogation with a robot to run across if information technology tin, in fact, think for itself.
Hollywood's visions of artificial intelligence are even so, in many means, a fantasy. But in recent years, we've seen the engineering take early strides toward making these visions a reality. Breakthroughs in cerebral computing—an industry term for technologies, such equally machine learning, reckoner vision, and e'er-on sensing—are rewiring our smartphones to become capable of sensing like humans do, evolving across phone call-and-response technologies such every bit Siri and Cortana to a more sophisticated interplay between machines and their users.
Powered by the speedily evolving field of cognitive computing, the devices nosotros use every mean solar day will soon be able to see, hear, and process what'south around them—enabling them to do more for u.s.a., while requiring less from us.
At Qualcomm, through its cognitive computing research initiative, researchers are leading the field of machine learning to make these ambitions a reality. A branch of machine learning called deep learning is demonstrating state-of-the-art results in pattern-matching tasks. This makes a deep learning–based arroyo ideal for giving our devices humanlike perceptual pattern-matching capabilities.
With every give-and-take we speak to our devices, machine learning volition aid these machines better cover the quirks of our speech, and with every route nosotros travel, they'll better understand the places that matter most to us. Equally our devices passively gather more data from us, they'll perfect the recommendations they make, the content they send u.s., and the functions they perform until they're our ain brilliant and futuristic personal administration.
"We're trying to basically mimic what humans exercise," says Maged Zaki, manager of technical marketing at Qualcomm. "We're […] trying to give them sight, and nosotros're trying to give them ears and ways to sense the surround and experience and touch all that, basically all the senses that we equally human beings have."
One of the biggest challenges for Qualcomm's team was how to harness the elaborate processing power required by deep learning and shrink information technology down onto a pocket-sized device.
"[Today'southward] machine learning is very compute intensive," explains Zaki. "It basically entails big servers on the cloud, and running the algorithms and training the machines days and days on the network to be able to recognize images."
Putting forms of deep learning onto a phone requires not simply a firm grasp of deep learning itself but a knack for working in tight spaces. Qualcomm'due south innovation has unlocked the way to put these power- and compute-intensive features completely on a chip inside a smartphone. As a result, phones will no longer need to completely rely on the cloud to outsource all their most daunting computing, which drains today's phone batteries and pushes phones to their technical limits.
Machines that call back like humans are still in their adolescence—recently, 1 of the most powerful artificial intelligence machines mistook Facebook CEO Mark Zuckerberg for a cardigan—merely on-device auto learning will begin to push computer intelligence out of its awkward stage in the coming years. Our interactions with our devices will become far more than natural: We'll eschew keyboards in favor of commands based on vox, gesture, or vision that work reliably.
The idea of always-on devices, able to listen to u.s. and watch our every motion, can send fifty-fifty the most tech-savvy person into a state of paranoia. But Zaki says that locked within these very aforementioned advances in cognitive computing are the solutions to better protecting our security and privacy in an increasingly connected world. "Instead of being scared of machine learning and having then many sensors on the device, we would like to use these technologies to actually raise privacy and security," he says.
A phone with humanlike "awareness" would notice suspicious activity, such as malware infiltrating our contact lists or credit card information, even when we're not even using the phone, and information technology would alert u.s.a.—or automatically end this from happening altogether. Zaki believes that machine learning will too make security and hallmark far more than convenient, as phones could use background verification of our fingerprints equally nosotros type, for example. "Our vision is that hallmark should be happening in the background continuously and seamlessly," he notes.
Presently enough, our smartphones will truly exist extensions of ourselves. We won't always take to tell them what to do, every bit they'll know our schedules, our desires, our needs, our anxieties. These are thinking machines that "complement us, not replace united states of america, on everyday tasks," says Zaki. "They'll expand the human being ability and serve as an extension of our five senses."
Learn more near Qualcomm.
"Nosotros never know whether [Tango and similar projects] even make viable business applications," says Johnny Lee, who leads the Tango team at Google, "but nosotros want to button the technology at times because you lot don't know what's possible on the other side."
One application that's consistently mentioned in the press around Tango is assisting the visually dumb. Though such a project hasn't been described in particular, Google's partnership with Walgreens and a mobile shopping visitor called Aisle411 offers a clue to how it might work.
As shoppers navigate the aisles of Walgreens, a Tango-powered tablet can track their motion inside centimeters to identify where specific products are located within the shop and on the shelves. Whether all shoppers want or need this sort of hyper-customized in-store guidance is up for debate, just for the visually dumb, information technology could transform their lives outside the home.
Devorah Shaltiel, the OrCam user who spoke at the conference, described the difficulties she faces at the grocery store. From even just a few feet away, a ketchup canteen would wait similar "a fuzzy red blob," and reading the labels on similar-looking items—such as dissimilar types of cookies from the same brand—would exist all but impossible. Today, one time she finds the item she's looking for, OrCam solves the latter trouble. But combining that technology with Project Tango could help Shaltiel find the items more efficiently and independently to brainstorm with.
For the visually impaired, independence is what matters. OrCam, which has the tagline "OrCam Gives Independence," launched to the public last year and has a waiting list of about 10,000 people for a $3,500 device. In his speech, Shashua plays emotional videos of people who have had their lives transformed but by beingness able to open their own mail, read a book, and identify the value of a dollar pecker they're holding.
"We are on the right runway," Shashua said at the end of his seminar.
That runway, he suggests, leads to the continued convergence of deep learning and figurer vision, where devices can not only translate images into plain English, merely also examine and describe the physical earth effectually us.
While nearly of these technologies are in their research lab beta phases, they'll emerge eventually. And when they do, they're likely to do so in force—with applications that make it across helping the visually impaired.
"Present, you lot wouldn't consider buying a phone without GPS," Google's Lee says. "We hope to encounter Tango kind of reach the same level of adoption."∎
Source: https://www.theatlantic.com/sponsored/qualcomm/the-space-without/542/
0 Response to "Can You See the Grand Canyon From Space"
Post a Comment