(Korean) Text Analysis in R and Pajek [INCOMPLETE]

R and its almost endless library of packages and plug-ins (CRAN) mean that you can do almost anything in R, including text analysis and network analysis. While you could do everything in R, that doesn’t mean you should. Specialized network analysis software can also be very useful when interpreting, analyzing or visualizing a network, as opposed to trying to automate everything with an R script. You don’t have to be monogamous: you can love R and you can love other software too.

The following is a tutorial explains how R can be used for text analysis (including creating word clouds) and then how your network can be exported, so you can analyze it in Pajek.

  • Loading the corpus (text), processing it and doing basic analysis (word counts) is done using the quanteda package (detailed guide here)
  • Making a Word Cloud is done using ggplot2 (detailed guide here)
  • We then show you how to export the package to Pajek, a popular open source network analysis and visualization software (official page here)

You can install the aforementioned R packages by typing:

install.packages('quanteda')
install.packages('ggplot2')

Even if you did install the packages earlier, typing the install command again will simply re-check, and if needed, update the package.

Loading Text Corpus

First you need to load the text you want to analyze into R. In this particular example the text is comments scraped from a website and stored in a TXT file (which you can open in Windows with Notepad). Every line is a new comment. The filename is ‘comments.txt‘.

In the first code chunk below, we start by loading the package quanteda using the library command.

Then we import the text from comments.txt into a data frame using the read.csv command. This command is used to load CSV files (“comma separated values”, a kind of spreadsheet). Because commas are the default separator in a CSV file, and our comments might contain commas, we need to put something else as a separator to not mess everything up. Basically anything that we are sure won’t appear in the text. In this example we make sep = “|”.

For easier processing we name the column containing the text ‘text’ using the names command.

library(quanteda)
textfile <- read.csv('comments.txt', sep= "|")
names(textfile) <- c('text')

Next we add the comment_ label to each piece of text (this is a feature of quanteda, which can also do more complex text analysis than what is shown in this example). And then we view a summary of the text corpus. You can also click on text.corpus in R Studio to see what’s inside.

textfile$label <- paste0('comment_', row.names(textfile))
text.corpus <- corpus(textfile)
summary(text.corpus)

Tadaaa! We have imported our text corpus. Now it’s time to process!

Text Processing

The next bit of code let’s you process text, basically cleaning it up.

We begin by removing punctuation and numbers, because they are not important in this particular situation.

text.tokens <- tokens(text.corpus, remove_punct = TRUE, remove_numbers = TRUE)

Then we remove stop words like “the”, “a”, etc. which we do not want to analyze. quanteda is awesome in that it has libraries of commonly used stop words for multiple languages. See here. In this case we use Korean stopwords (language ko) from the marimo repository.

text.tokens <- tokens_select(text.tokens, stopwords('ko', source='marimo'), selection='remove')

The list of stopwords should be critically assessed. The scraped website comments that we used to try this out were filled with slang, for example. And so many additional stopwords were added. An example of the kind of Korean-language stopwords that might need to be added can be found in this example of Korean text analysis with quanteda on Github. You can also identify stopwords from the word frequency analysis (see below).

You can also do some other processing, such as reducing words to the word stem or harmonizing all words to lower case. These aren’t relevant when processing a Korean language text, but they may be relevant for other languages, such as English.

text.tokens <- tokens_wordstem(text.tokens)
text.tokens <- tokens_tolower(text.tokens)

When you’re done, you can compile all the beautifully clean processed text into a document feature matrix.

text.dfm.final <- dfm(text.tokens)

Word Frequency and Word Cloud

Finally, we can start to do the fun stuff, text analysis! As a first step its worthwhile to look at the word frequency analysis to see if there are any frequently used words “polluting” your analysis. For example, in an analysis about a movie, you may want to remove the title of the movie. The code for producing a word frequency data frame named wfreq is below, with the 100 most frequently occurring words:

wfreq <-topfeatures(text.dfm.final, 100) %>% as.data.frame()

The word frequency data can also be converted into a word cloud, whereby more frequently occurring words appear larger and in the center of the cloud.

set.seed(132); textplot_wordcloud(text.dfm.final, max_words = 100)

Depending on the text used, this enables you to generate word clouds that will look something like this…

Export to Pajek

(to be added)

co.matrix <- fcm(text.tokens, context= 'document', tri= F) #generate co-word matrix (within same review) feat <- names(topfeatures(co.matrix, 30)) #select top-30 words 

Dutch Pillarisation, Malaysian Rojak

Pillarisation (or verzuiling in Dutch) is the state of a society that is divided into groups that self-segregate. Until the 1960s and 1970s, the Netherlands was a country whose population was divided along sectarian lines. There was a Catholic pillar, a Protestant pillar, a Socialist pillar and a Liberal pillar. These groups had their own schools, broadcasters, newspapers, political parties, labor unions, employer federations, universities, hospitals, shops and sports clubs. Marriage and friendships between families from different pillars were either discouraged or simply not allowed. The Catholic school kids would always fight with the Protestant school kids. A good Catholic would only buy from Catholic shops. The priest or minister would make house visits to ensure everything was being done “correctly”. The Netherlands was a segregated society with extensive social control within the respective pillars.

To a Malaysian this system may seem strangely familiar, as Peninsular Malaysia has its own racial-linguistic pillars: Malay, Tamil, Chinese and English. Each has their own media outlets, political parties, educational institutions, neighborhoods, popular shopping malls, cuisine, places of worship, social clubs, chambers of commerce, etc. And while Malaysians of different races do mix regularly, especially in the workplace, the number of Malaysians who marry or maintain deep friendships across racial lines, is relatively limited. Within many groups there is still a strong sense of social control, and opinion polls show that a large segment of Malaysian society is still very conservative regarding social issues.

The big difference between Malaysia and the Netherlands is racial and linguistic: the Netherlands during pillarisation was, for all intents and purposes, a mono-lingual and mono-ethnic country. Malaysia of course is multi-racial and multi-lingual. Malaysians often refer to their society as ‘Rojak’, a salad of fruit, vegetables and sometimes egg and tofu, all mixed together and covered in a sauce. The point is that each of the items in the salad retains their individual characteristics, they do not melt or assimilate into one uniform Malaysian soup or porridge. There is only a Malaysian sauce that unites them.

In the Netherlands more progressively minded individuals from within each pillar tried to break down barriers between them. The Netherlands became a much less religious and more individualistic society during the later part of the 20th century, and this weakened the social control from within the pillars. This loosening eventually lead to various mergers between labor unions, political parties, broadcasters, etc. Schools accepting students from diverse backgrounds. Inter-religious marriages and friendships losing their stigma. Today the process of de-pilarisation in the Netherlands that took place in the 1960s and 1970s is primarily seen as a social process which brought about institutional and political change.

Since the 1990s the Netherlands is widely seen as one of the most liberal countries, having legalized prostitution, soft drugs, gay marriage and euthanasia, all abhorred by the conservatives. However this is not to say that all religion or conservative values have disappeared. In fact, the Netherlands is also home to a substantial ‘bible belt’ of mainly conservative Protestants, who have maintained their pillars. The stereotype is of large conservative families, who attend church regularly, strictly observe Sunday as a day of rest, and in some cases, oppose modern technologies such as television and vaccinations. This diversity in views is represented in the Dutch parliament: there is a conservative political party that would like to deny women the right to vote, a party for animals, a party that would like to deport all Muslims and recently, a party that fights (only) for the rights of Muslims.

So does the Dutch experience suggest that Malaysia will inevitably de-pillarise? That the ‘Rojak’ will become ‘Laksa’ or ‘Bubur’? If anything, modern Malaysia seems to have pillarised more since independence. Many Malaysians growing up during the 1950s and 1960s in Malaysia remember a more multi-ethnic society in areas such as education or the civil service. Yet this may also have been an illusion of the elite: Malaysian society at the level of the working class was perhaps always more deeply divided along racial and religious lines. Government policy since independence has largely aimed to maintain or reinforce those divisions, perhaps primarily as a tool to maintain social control, and not dissimilar to the ‘neat’ political divisions in the Netherlands after World War II.

What should be remembered is that Dutch de-pillarisation was accompanied by a phenomenal economic transformation of the country after 1945. In Malaysia, arguably a greater degree of de-pillarisation has occurred in more prosperous urban areas, such as the Klang Valley. In those areas more multi-ethnic parties tend to perform well in elections, presumably reflecting different social values of the local population. Ethnic-based parties tend to perform better in less prosperous rural areas of Malaysia.

While Malaysia will not experience de-pillarisation in the same way that The Netherlands has, the comparison with The Netherlands suggests two things that might be relevant in a Malaysian context. First, that socio-economic changes are the main drivers of the cultural and political changes that brought about de-pillarisation. Second, that the pillars — the institutions, social bonds, ways of life — will survive, although they will lose influence.

Is Cryptocurrency a Good Investment?

With the total value of bitcoin hitting US$1 trillion this month (greater than the annual nominal GDP of The Netherlands, or the value of Facebook) cryptocurrency is an economic phenomenon to be reckoned with. But size isn’t everything and cryptocurrency’s popularity is not necessarily indicative of its long term profitability.

To try and answer the question if cryptocurrency is a good investment, it’s relevant to first clear up what cryptocurrency is? Is it Money? And if yes, what kind of money?

Cryptocurrency as Money

Money, according to economists, is a store of value and a medium of exchange. Money can be a dollar, ringgit, yen, rupiah. But it can also be sea shells during the stone age or cigarettes in prison. Money is useful because it is fungible: it can quickly be converted into something that you want or need. For money to become fungible it needs to have a community of users who agree on its value and are therefore willing to accept it as payment. The value of money is what you can buy with it, after all.

The community around money is typically the nation state. The nation state issues currency via a central bank or currency board and then proceeds to spend, issue debt and tax its citizens in that currency. In most cases this is enough to persuade the vast majority of people in the country to start using government-issued money, especially if the value of that money is relatively stable. Countries which cannot manage the stability of their own currency might use another country’s “hard” currency. Think of how some countries also use US dollars, euro, pound sterling or Indian rupee alongside, or instead of, a local currency.

Aside from this government centric view of currency, a currency can also be used because it lowers transaction costs. A prime example of this is the US dollar: when a Malaysian trades with someone from Vietnam, they will usually not deal in ringgit or dong, but in a third currency like the US dollar. This is not because the ringgit or dong are crackpot currencies, they are highly fungible in their domestic economies, but because in the foreign exchange market the US dollar has huge volumes and a low spread. There are huge volumes of US dollar-dong and US dollar-ringgit transactions daily, whereas direct ringgit-dong transactions are very rare. Therefore a transaction that converts ringgit to US dollars to dong will typically offer the lowest transaction cost.

So where does this leave cryptocurrency? Does cryptocurrency have a ‘captive’ community that accepts it as payment? Or can a cryptocurrency be used in an international setting, like the US dollar, connecting different currency communities?

Welcome to Cryptoland

Although cryptocurrencies like bitcoin are not backed by any national governments (this is part of their attraction) there are communities in which cryptocurrencies are being used.

Especially in situations involving online transactions that safeguard privacy and don’t involve any paperwork, anonymous cryptocurrency like bitcoin can be very useful. This partly explains the popularity of cryptocurrency in crime: hiring an assassin, buying weapons or drugs, paying for hacked personal data, running a ransomware business, etc. The privacy that cryptocurrencies can provide seems ideal for any situation where one wants to economically subvert a government. Its anonymity should help with tax evasion or to circumvent capital controls. While there are also legitimate uses of cryptocurrency, its a bit like offshore financial centers: an anonymous offshore company in an opaque jurisdiction is exceptionally useful if you’re engaged in tax evasion, corruption or money laundering.

Nevertheless, we have to admit that there is a large and presumably thriving Cryptoland where cryptocurrency is fungible and widely accepted in transactions. How big is the economy of Cryptoland? It’s estimated that crime accounts for about 3.6% of global GDP. If we assume global GDP of US$90 trillion, that means Cryptoland’s economy has a GDP of US$3 trillion and possibly more, depending on the real size of the grey and black online economy.

Yet its use in the shadow economy will likely harm cryptocurrency’s long-term prospects as governments will either ban it or seek to closely regulate it in ways that makes it less attractive for the seedier denizens of Cryptoland, who are presumably its main users now. In terms of transactions Cryptocurrency could have a cost advantage when compared to credit cards, but many countries also have very competitively priced domestic payment systems. In Malaysia online bank transfers cost around 10 sen (around 2 US cents and often waived for consumers), which is an amount that’s not easy to undercut for a cryptocurrency. Of course other payment services like credit card, QR codes, etc. are more expensive, but ultimately payment processing seems unlikely to be cryptocurrency’s long-term competitive edge.

Bye Dollar, Hello Crypto?

If wide adoption of cryptocurrency beyond the shady shores of Cryptoland seems unlikely, is it possible that cryptocurrency will serve as a connector of national currencies, much like the US dollar is today? Especially in international transactions banking fees tend to be much higher and the opaque, old, unsafe and inefficient system of correspondent banks and SWIFT seems ripe for disruption.

This is an area where an electronic currency, something like a cryptocurrency, could offer an advantage. But the question then quickly becomes: which cryptocurrency would one use? Many of the fiat cryptocurrencies like bitcoin, ethereum or dogecoin, are not backed by any assets and they tend to be highly volatile. This means that for practical transactions like from ringgit to dong, a cryptocurrency instead of the US dollar might not be the safest or smartest option.

A more popular alternative will likely be a crypto token backed by something else: say gold, silver, crude oil, palm oil or currencies like the US dollar, euro or yuan, which has a relatively stable price. If transactions in such a cryptocurrency are cheaper and easier than current cross-border transaction systems, this is probably a more viable future for cryptocurrency.

The above analysis suggests that, outside of Cryptoland, cryptocurrencies are not viable as money. But perhaps that’s not the insurmountable barrier that it seems to be. After all, gold used to be the world’s money, and although that role has largely come to an end, gold is still being held as an asset mainly in gold bars or as jewellery (its industrial use is quite limited)

Crypto as the New Gold?

Viewing cryptocurrency as a new gold rush arguably makes the most sense: fortunes are being made and lost, its highly volatile, there are robbers waiting to pounce… and the people who are getting rich seem to be the equipment suppliers (NVIDIA) and the cryptocurrency dealers (Coinbase), perhaps less so the miners themselves.

The value of gold is purely based on social perception: it is valuable because people have always thought that it is valuable. Cryptocurrency seems to have a similar sociology behind it, whereby most buyers buy because they want to be owners of Cryptocurrency, the idea that investment is consumption. Obviously the ownership of gold is more institutionalized and has a much longer history, but it also has a similar investment as consumption sociology.

So does it make a lot of sense to own cryptocurrency if it is like gold? Consider these two observations:

  • Gold is not a great long-term investment over the very long term. Stocks have historically had better returns (dividends included), although there have been periods where gold did out-perform stocks. If you do invest short-term, are you confident that you will make a profit and not lose your shirt in a very volatile and unregulated market?
  • There’s not one cryptocurrency in fact there are hundreds of cryptocurrencies. So it’s possible that you won’t buy the winner, perhaps because the winner hasn’t been launched yet. Before Google, Facebook and Amazon, there were hot internet services like Netscape, Compuserve, ICQ, Lycos and Friendster. “Whatever happened to them?” you may wonder. The same may apply to your cryptocurrency investment.

So as a serious investment, to which you may wish to divert a substantial part of your savings, cryptocurrency does not seem to fit the bill. But if you have money lying around that you would otherwise have just blown at the casino… then perhaps its not the worst idea. At least it will give you something to talk about at parties.

Index Investing at Bursa Malaysia

Index investing is very popular, especially in Europe and North America but perhaps less so in Asia. The idea behind index investing is simple: because an average investor cannot beat the market consistently over long periods of time, you are best off putting your money in a well-diversified investment fund and spend your time instead on other things, like playing with your kids, improving your golf handicap, updating your social media accounts, running your business, etc.

Research suggests that professional fund managers also don’t beat the index consistently, so you will probably also fail to beat the index if you try to pick the “best” actively managed mutual fund (for background get a hold of Malkiel’s A Random Walk Down Wallstreet and Bogle’s Common Sense on Mutual Funds). Since managed mutual funds tend to have higher charges, that’s another reason to choose an index fund.

But all of this research is typically based on the realities in North America and Europe: large, liquid, (supposedly) well-regulated and sophisticated markets. What about a market like Malaysia’s that’s relatively small and often dominated by large family-owned or government-owned firms. Do index funds beat mutual funds in Malaysia too?

Let’s Hypothesize

Factors that might make Malaysia’s stock market unlike that of North America and Europe is the fact that:

  1. Malaysia’s stock market has some regulatory and governance issues (loads of examples here), although these tend to occur with smaller companies that are not included in indexes like the FTSE Bursa Malaysia KL Composite Index ,
  2. Malaysia’s economy has some political patronage going on too, as well as large family-controlled businesses through which minority shareholders may get the short end of the stick when they invest in these companies (see also Gomez & Jomo’s Malaysia’s Political Economy: Politics, Patronage and Profits and Studwell’s Asian Godfathers: Money and Power in Hong Kong and South East Asia),
  3. Malaysia’s economy is largely dominated by government-owned companies. Among the 10 largest companies by market capitalization listed on Bursa Malaysia in Sept 2020, 6 have the government as a major shareholder: Tenaga Nasional, Maybank, CIMB, Petronas, Axiata and Sime Darby (see also Gomez et al’s Minister of Finance Incorporated: Ownership and Control of Corporate Malaysia). Significant government ownership means that Malaysia’s national (or political) interest may drive corporate decision-making, rather than shareholder return. If anyone needs a case study in how government-linked companies can be mismanaged: remember the FGV saga?

All of this taken together means that (similar to Singapore) the top tiers of the Malaysian stock market are typically filled with rather conservative family or state-owned firms engaged in financial services, utilities or plantations. There are no Googles, Samsungs or Tencents to drive growth like in the US, Korea or China. In such an environment stock-picking might actually be a way to consistently beat the stock market index because it would mainly involve avoiding the “dead wood” state- and family-owned businesses.

Running the Numbers

So how do we test this theory? Lacking access to any fancy data sets from Bloomberg or Reuters, I used the Fund Selector function from FundSuperMarket (FSM One), a discount local mutual fund brokerage. If selecting funds with asset class Equity and their geographic focus on Malaysia around 62 funds provide a total return figure (capital appreciation + dividend) for the past 10 years. Of those 62 funds, 25 are Shariah-compliant funds (including 1 index fund) and 37 are conventional funds (including 2 index funds).

There is a large difference between the conventional and shariah funds in terms of their holdings, and that difference is financial institutions. Most of Malaysia’s large banking groups (Public Bank, Maybank, CIMB) have a conventional banking unit engaged in usury (conventional interest-baring loans, which aren’t allowed under Shariah) and so are removed from the Shariah funds. For this reason its meaningful to make separate comparisons between conventional managed funds vs. index funds and Shariah managed funds vs. index funds.

Finally it must be noted that the index funds must invest in large-capitalization stocks (all follow the relevant FTSE Bursa Malaysia index) whereas managed funds can and do invest in smaller and medium-capitalization stocks. And there could be some survival bias: poorly performing managed funds may have been closed down. Nevertheless, we attempt a comparison…

From the results it appears that in both categories (conventional and Shariah) the managed funds on average outperform the index funds. For conventional funds this gap is 3.61% but for Shariah funds its just 0.12%. So in that sense our hypothesis of Malaysia being different from North America and Europe, seems to hold, although for the Shariah funds, only by 0.12%, which may be within the margin of error.

Also noteworthy is that the Shariah index fund (average +5.12%) has significantly outperformed the Conventional index funds (average +1.98%). Taken over 10 years, that’s a stunning gap of 31.4%. If you had put your money in a Fixed Deposit you would probably have beat conventional index funds over the last 10 years. Ouch!

Fund Type10 Year Average
Annual Return
Conventional Index Funds (2 funds)+1.98%
Conventional Managed Funds (35 funds)+5.59%
Performance Gap+3.61%
Comparing Conventional Malaysian Equity Funds (data: FSM One, 8 Feb 2021)
Fund Type10 Year Average
Annual Return
Shariah Index Fund (1 fund)+5.12%
Shariah Managed Funds (24 funds)+5.24%
Performance Gap+0.12%
Comparing Shariah Equity Funds (data: FSM One, 8 Feb 2021)

Analysis

So how can we better understand these surprising figures of out-performing managed and Shariah funds compared to the conventional index funds?

Two quick explanations come to mind. First, the conventional FTSE Bursa Malaysia KLCI index contains 3 large financial groups which are not a part of the Shariah index. Those financial institutions seem to have been a large drag on the conventional index funds’ returns.

Second, the fees of the index funds are substantial. Whereas many popular index funds in North America or Europe charge around 0.2% in management fees, the fees charged by Malaysian index funds, especially the conventional ones, are high. RHB’s KLCI Tracker charges 1.50% per year (comparable to a Malaysian managed funds) and Principal’s KLCI-Linked Fund charges a little less, but still 0.95%. PMB’s Shariah Index Fund only charges 0.60% as a management fee, which makes a big difference, especially when they compound over the long term.

Therefore the out-performance of Malaysian managed funds during the past 10 years seems to be largely explained by their ability to avoid investing in some large but under-performing stocks which may have structural governance issues by virtue of their family ownership, government ownership or some kind of political exposure. A profit-maximizing fund manager may therefore be able to avoid these stocks, but they nevertheless dominate the index because the size of the firms and ownership by government investment funds keeps their market capitalization high.

So what’s a lay investor to do? Well, managed mutual funds may not be a bad way to invest in the Malaysian stock market after all. Now the next challenge: figuring out which ones to buy.

Disclosures

The data used to make the calculations is available here.

I am a customer of FSM One and own fund(s) that were included in this analysis but I did not receive any compensation from FSM One, fund managers or any other related party in relation to writing this piece.

The Regional Economies of ASEAN

The 10-members of the Association of Southeast Asian Nations (ASEAN) are vastly different in terms of their size, level of economic development, religion, language and cultures. Within the larger member states, such as Indonesia, the Philippines or Vietnam, there are also great differences between regions. This post is a very brief analysis of the regional economies of ASEAN based on the data from Wikipedia’s List of ASEAN country subdivisions by GDP.

While there are some caveats to comparing regions (subdivisions), such as differences in population on geographic size and the fact that some economic areas cross domestic and international boundaries (think about SiJoRi, Singapore-Johor-Riau) and that Wikipedia’s list seems incomplete (where did Kelantan go?), the comparison nevertheless provides an idea about what the major economic centers of ASEAN are. The comparison uses Purchasing Power Parity (PPP) statistics to account for the often large differences in cost of living.

ASEAN’s Largest Metropolitan Economies

The list of ASEAN’s largest metropolitan economies is shown in the table below and lists Jakarta as the largest metropolitan economy (by far), followed by Singapore, Bangkok and Manila (all in the US$500-600 billion range) with Malaysia’s Klang Valley in #5 position. Immediately noticeable are the large differences in population size: Jakarta’s more than 33 million population compared to Singapore’s population of “only” 5.7 million.

At the bottom of the table, Surabaya (#2 in Indonesia) is similar in economic size to Ho Chi Minh City (#1 in Vietnam) and that Hanoi (#2 in Vietnam) is similar in size to Bandung (#3 in Indonesia).

RankMetropolitan RegionPopulationGDP-PPP (US$ billion)
1Jakarta metropolitan area33,926,330978,490
2Singapore5,670,180585,060
3Bangkok Metropolitan Region15,931,300575,160
4Greater Manila Area25,766,930528,440
5Klang Valley (Kuala Lumpur)8,026,970372,560
6Surabaya metropolitan area9,885,400250,460
7Ho Chi Minh City metropolitan area13,848,400244,820
8Hanoi Capital Region9,957,100144,810
9Bandung metropolitan area8,598,530122,780
ASEAN’s largest metropolitan economies (2017-2019) — Wikipedia

ASEAN’s Most Prosperous Regions

Because of the large differences between metropolitan regions in terms of their population sizes, its also useful to have a look at GDP per capita (PPP). Here we look not only at the metropolitan areas but at all the regions listed on the Wikipedia page because smaller non-metropolitan regions can have high income levels.

As expected Singapore ranks #1, but its perhaps surprising that the city state is followed by Kuala Lumpur instead of Brunei, and then Jakarta and Bangkok. This suggests that there is a large concentration of high income in the capital cities of Malaysia, Indonesia and Thailand.

As one continues down the list East Kalimantan (#6) is next and then Eastern Thailand (#7). East Kalimantan is a largely rural province of Indonesia but has a large oil & gas sector and will be home to the new Indonesian capital. Eastern Thailand lies east of Bangkok and is home to, among others, the beach resort of Pattaya and Thailand’s newly launched Eastern Economic Corridor.

Next are four Malaysian states: Penang is home to a large electronics industry, Sarawak also has a small population and large oil & gas sector, Selangor borders the Malaysian capital, Kuala Lumpur and Malacca is a small state, also home to an electronics industry. Samut Sakhon borders the Thai capital Bangkok. Negri Sembilan is still within a 1-hour driving range of the Malaysian captial and the aformentioned state of Malacca. The Riau Islands border Singapore. Samut Prakan also borders Bangkok.

Thus the high-income regions of ASEAN tend to either be major economic centers (e.g. Jakarta, Singapore, Bangkok, Kuala Lumpur), are located near those large economic centers (e.g. Selangor, Samut Sakhon or Riau Islands) or have a large oil & gas sector (Brunei, East Kalimantan and Sarawak).

The exceptions to this rule are Malaysian states like Penang and Malacca, which are all located on the country’s West Coast and share historical and cultural similarities with Singapore and Kuala Lumpur.

Notably absent from the list is Manila and also Ho Chi Minh City and Hanoi: major metropolitan economies which are not among the more prosperous economic regions of ASEAN.

RankRegionGDP-PPP
per capita (US$)
1Singapore103,181
2Kuala Lumpur83,857
3Brunei80,383
4Jakarta62,549
5Bangkok46,056
6East Kalimantan40,833
7Eastern Thailand40,179
8Penang36,224
9Sarawak36,157
10Selangor35,624
11Malacca33,156
12Samut Sakhon33,009
13Negeri Sembilan29,761
14Riau Islands28,460
15Samut Prakan27,543
ASEAN’s Most Prosperous Regions (2017-2019) — Wikipedia

Final Thoughts

The vast regional differences within ASEAN countries are actually quite surprising. While Malaysia is home to Kuala Lumpur (GDP PPP per capita of US$84k) its also home to Kedah (GDP PPP per capita of US$15k), an income gap of more than 5 times! And even if Kuala Lumpur is seen as an extreme, income levels in Penang, which borders Kedah, are already twice as high as those in Kedah. Within Indonesia and Thailand you can find similarly large differences.

So while its often emphasized that income levels between Singapore and its neighbors are vast, they are also vast within countries, including in a relatively small country like Malaysia. Perhaps that’s the real story behind ASEAN’s economic diversity.

Installing Lime Survey on an Ubuntu 20.04 VPS

This tutorial mainly includes links to other good tutorials and takes you through the entire process from first accessing your virtual private server (VPS) with an Ubuntu 20.04 operating system to having a fully operational installation of Lime Survey. It should serve you well in any Ubuntu-situation and is based on the tutorials I used, plus some free additional tops and commentary. The whole process should take you about 2 hours, assuming you do it peacefully with a bit of multi-tasking. Step 3 takes by far the longest.

STEP 0: Access your server using an SSH connection. If you have a Windows operating system, download PuTTY. In Linux you can just type ssh [email protected].

STEP 1: Its good safety practice to not login to your VPS with root access but to instead create a sudo user. So do that first.

STEP 2: To install Lime Survey you need to install a so-called LAMP stack: Linux, Apache, MySQL and PHP, the basic code stock on which the Lime Survey installation sits. You can follow the following LAMP tutorial, however be sure to install a slightly different collection of PHP-scripts by running:

sudo apt install php php-cli php-common php-mbstring php-xml php-mysql php-gd php-zip php-ldap php-imap

STEP 3: You can download the latest community edition stable (LTS) version of Lime Survey from the official community website. I’ve found it simpler to just download to my PC, unzip everything and then upload via SFTP (with the same login credentials as you use for SSH to access the VPS). Only the uploading can take quite some time (around 1 hour in my experience, just let it do its thing). Place the Lime Survey files in the directory of the virtual host that you made in step 2 (e.g. /var/www/survey)

STEP 4: You need to take a few more steps before Lime Survey is ready to run. (As a reference, there is also an official installation manual) First of all certain folders need to be fully accessible to the Apache web server. You can do this with the following commands:

sudo chown -R www-data:www-data /var/www/survey/tmp
sudo chown -R www-data:www-data /var/www/survey/upload
sudo chown -R www-data:www-data /var/www/survey/application/config

Its also wise to create a MySQL user specifically for Lime Survey, e.g. limeuser. Note down the user name and password for use in step 6.

sudo mysql
CREATE USER 'limeuser'@'localhost' IDENTIFIED BY 'DIFFICULTPA$$WORD';
GRANT SELECT, CREATE, INSERT, UPDATE, DELETE, ALTER, DROP, INDEX ON lime. * TO 'limeuser'@'localhost';

STEP 5: Its important to have an SSL certificate for your domain so that the connection with the survey-taker and other users is encrypted. You can get it for free via Let’s Encrypt, a service of the non-profit Internet Security Research Group (ISRG). A tutorial is available. In case certbot package is not installed on the VPS, use this code to install it:

sudo apt install certbot python3-certbot-apache
sudo certbot --apache

STEP 6: OK, now it’s time for the real test. If you access your Lime Survey installation for the first time, it will self-check that everything is correctly configured. You will need to provide the MySQL user details (step 4) and the rest is pretty self-explanatory. Woohoo, well done! Ypu’ve installed your own Lime Survey server!

limer code examples

limer is an R package that enable R users to connect directly to a Lime Survey installation via its API (for details, see earlier post), essentially giving you remote control and a possibility of automating certain procedures.

Because the documentation of limer and the Lime Survey API is a bit minimal and therefore quite confusing for a first-time user, I give some simple coding examples below to get you started.

First, we connect to our Lime Survey instance, which is installed at LIMESURVEY.URL and can be accessed with a LIME.USERNAME and LIME.PASSWORD. Obviously you should replace these with your own installations’ details in the code below. The get_session_key() command gets you a unique key through which you can securely access the Lime Survey installation. This is automatically used in all the limer calls.

library(limer)

#LimeSurvey Server Info
options(lime_api = 'https://LIMESURVEY.URL/index.php/admin/remotecontrol')
options(lime_username = 'LIME.USERNAME')
options(lime_password = 'LIME.PASSWORD')

get_session_key()

The first example is a simple one that uses a built-in function from limer, get_responses. This simply allows you to download all the data (or only completed data) from a particular survey, #10001 in this case.

responses <- get_responses(10001, sCompletionStatus = 'all')

The second example requires greater knowledge of the LimeSurvey API language because the limer package does not have a neat wrapper for these functions. Instead the generic call_limer function is used in which calls from the original API can be introduced. The full guide of these API functions is available here.

The example below involves listing all the surveys on the Lime Survey installation (server) and then getting the number of completed responses. Note that the method inserted into the call_limer() function is the same method that is listed in the API documentation and the params are the arguments of that respective method. So in this sense, it’s actually quite straight forward

call_limer(method = "list_surveys") #list surveys on server

call_limer(method = "get_summary", #get number of completed responses
           params = list(iSurveyID = 10001,
                         sStatname = "completed_responses"))

The third and last example showcases some of the more sophisticated automation options. We aim to copy survey 123456, setup a participant table with two extra attributes: Institution and File, add one participant, activate the survey, compose the survey link and then, delete the survey.

When the initial survey is copied and users are created, details are stored in tmp and tmp2 because we wish to use these outputs and inputs for later functions.

The fromJSON function (which is from the JSONlite package) is also used to feed arrays with multiple pieces of data into the call_limer function. There might also be other ways to do this, but the below example works.

tmp <- call_limer(method = "copy_survey", #copy a survey
                  params = list(iSurveyID_org = 123456,
                                sNewname = 'The Copied Survey'))

call_limer(method = "activate_tokens", #setup participant table
           params = list(iSurveyID = tmp$newsid,
                         aAttributeFields = fromJSON('{"attribute_1":"Institution","attribute_2":"File"}')))

tmp2 <- call_limer(method = "add_participants", #add participant
                   params = list(iSurveyID = tmp$newsid,
                                 aParticipantsData = fromJSON('[{"email":"[email protected]","lastname":"Bond","firstname":"James","attribute_1":"Secret Service","attribute_2":"mi5","usesleft":999999}]'),
                                 bCreateToken = TRUE))

call_limer(method = "activate_survey", #activate survey
           params = list(iSurveyID = tmp$newsid))

paste0('https://LIMESURVEY.URL/index.php/', tmp$newsid, '?token=', tmp2$token, '&newtest=Y') #generate survey link

call_limer(method = "delete_survey", #delete the survey
           params = list(iSurveyID = tmp$newsid))

As is hopefully clear now, the limer package offers some powerful options for automating the setting up of surveys as well as importing the data into R.

Finally, its good practice to close off the session with the following call.

release_session_key()

Lime Survey and R: limer

Lime Survey is (probably) the world’s most popular open source survey package and R is the world’s most popular open source statistical and data analysis software (probably). So it seems only natural that there should be a bridge between the two: where the limitations of Lime Survey begin, R can take over and vice-versa.

Thankfully the bridge has been laid by an R package called limer. limer uses the API functionality that is built into Lime Survey to make calls to a Lime Survey instance. The most useful of these is probably the ability to get survey responses. But there is a wealth of other options too, including copying and deleting surveys, getting survey statistics, etc.

The github page of limer provides instructions on how to install it and how to make some basic calls. You also need to enable the API in your Lime Survey installation, which you can do under Settings –> Global Settings –> Interfaces. Be sure to set it to “JSON-RPC” and you will also see the URL to access the API (see image below)

Details on the Lime Survey API are available, although admittedly the documentation is a bit thin.

Assuming you have configured your Lime Survey instance with an SSL certificate (giving the https:// URL), the connection between R and Lime Survey is also encrypted and therefore any personal or sensitive data being exchanged between Lime Survey and R, is secured.

limer coding examples are to follow in a later post.