Combining vision and language could be the key to more capable AI – TechCrunch
Depending on the theory of intelligence to which you subscribe, achieving “human-level” AI will require a system that can leverage multiple modalities — e.g., sound, vision and text — to reason about the world. For example, when shown an image of a toppled truck and a police cruiser on a snowy freeway, a human-level AI…