word2vec revolutionises geography

Google Research’s word2vec tool, quoting from https://code.google.com/p/word2vec/, “provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words.”

It produces some curious results: Russia is closer to France than Germany is:

A simple way to investigate the learned representations is to find the closest words for a user-specified word. The distance tool serves that purpose. For example, if you enter ‘france’, distance will display the most similar words and their distances to ‘france’, which should look like:

                 Word       Cosine distance
-------------------------------------------
                spain              0.678515
              belgium              0.665923
          netherlands              0.652428
                italy              0.633130
          switzerland              0.622323
           luxembourg              0.610033
             portugal              0.577154
               russia              0.571507
              germany              0.563291
            catalonia              0.534176

 

from http://www.kalimedia.com/Kartographie.html
from http://www.kalimedia.com/Kartographie.html
css.php