Here is a direct link for English Corpus containing all the articles from 2014 data dump.
The file contains comma separated values in the form
id,word,frequency. The frequency is not normalized. It is recorded as an integer to represent the number of occurrences in the cleaned version of the Wikipedia. Since the number of words is just 65536 it is easy to load and normalize this information on the fly.
This file contains encoded articles for each of the 65536 words. It is encoded as a list of comma separated IDs that represent words. IDs correspond to the word-list linked above. Articles are sorted in descending order by the word id. Each article has been chosen by maximizing the word frequency of a particular word the article represents. Each line is written in the following form:
word-ID,1st-word-ID,2nd-word-ID, ... ,nth-word-ID\n
word-IDis the word article corresponds to
1st-word-ID,...are words of the article itself encoded with word IDs.
nis the number of words in the article
The neocortex is complex. Within its 2.5 mm thickness are dozens of cell types, numerous layers, and intricate connectivity patterns. The connections between cells suggest a columnar flow of information across layers as well as a laminar flow within some layers. Fortunately, this complex circuitry is remarkably preserved in all regions, suggesting that a canonical circuit consisting of columns and layers underlies everything the neocortex does. Understanding the function of the canonical circuit is a key goal of neuroscience.
At the heart of Hierarchical Temporal Memory (HTM), our machine intelligence technology, are time-based learning algorithms that store and recall spatial and temporal patterns. This paper describes how the learning algorithms work and their biological mapping.
The two faculties - making analogies and making predictions based on previous experiences - seem to be essential and could even be sufficient for the emergence of human-like intelligence.