Starmind Resources | Blogs, Whitepapers, Case Studies and Events

Starmind adds Chinese language support to better serve global users

Written by Stijn Vermeeren | Jul 15, 2019 10:00:00 PM

Users in 100+ countries use Starmind to leverage the collective intelligence of their organizations - to access the internal experts they need to get work done. With this in mind, we strive to make Starmind the platform that as many people around the world can use, and in doing so, better serve our global enterprise customers and communities.

Introducing Chinese language support in Starmind was a multi-step process that involved close collaboration between our AI and Product teams. The first factor to consider is that there are two written forms: Traditional (which is dominant in Hong Kong, Macau, Taiwan, and overseas Chinese communities) and Simplified (which was introduced by the People’s Republic of China in the 1950s and 60s to promote literacy and is now dominant in mainland China, Malaysia, and Singapore). As you can see below, there is quite a degree of variation between these two written forms.

Both the Traditional and Simplified Chinese alphabets contain thousands of different characters. Consequently, the information density of Chinese text is a lot higher than for European languages. Most Chinese words consist of just one, two or three characters. Chinese does not use any spaces between words and also uses its own set of punctuation symbols. As such, correctly segmenting sentences into individual words is considerably more challenging for Chinese than for European languages.

Another adaptation we had to make was to augment our search algorithms. To support search in Chinese we leverage a dedicated Elasticsearch index. The Chinese search index is able to offer results in both written forms: Traditional and Simplified, regardless of which written form was used for the search query.

Tag Extraction and Autocomplete suggestions were also areas of our algorithms which required specific logic to deal with Chinese. Thankfully, Chinese is a so-called isolating language, meaning that there is no inflection of words (for example: no conjugation of verbs). Consequently, many steps in our natural language processing pipeline are actually simpler for Chinese than for other languages already supported by Starmind (English, German, Spanish, French and Italian).

At Starmind our vision is to make collective human intelligence accessible to everybody. With over 1 billion native speakers, adding Chinese language support is one more important step in achieving that vision and better serving our customers. 再见!