Hang's Blog: February 2013

Friday 22 February 2013

What to write in a birthday card

Happy, happy, happy birthday!! Don't forget your a year older.
Thanks for inviting us all to come celebrate your birthday!! Hope the cake is good :) All the best.
Wishing you a very Happy Birthday!! We hope you have a great day and all your wishes come true.
Wishing you a very Happy Birthday!! We hope you have a wonderful day and get spoilted with gifts!
Happy Birthday!! We hope all your dreams and wishes come true
Happy Birthday!! Wishing you all the best for today and in the future. Now let's PARTY!!!
Better late than never so HAPPY BIRTHDAY!! Wishing you all the best my friend - All the best.
Happy birthday old [man/lady]!
Happy birthday you oldie but goodie!! We hope all your wishes come true.
Wishing you a happy [X]th birthday. All the best.
HAPPY BIRTHDAY!!! Wishing you a great year ahead.
Happy Birthday!! You're now older and hopefully wiser - Have a great day.
Card messages messages aren't my thing - Happy Birthday!
Happy birthday [name]!! We hope you have a great day and get spoilt rotten! Love you lots.
Wishing you a great birthday!! All the best and we hope you get lots of presents!! All the best.
Happy birthday to my best friend!! Wishing you all the best and we hope you have a great year ahead
Wishing you all the best on your birthday - We hope you get spoiled with lots of presents!! All the best.
Happy Birthday!! Tonight will be a big one - All the best.
Wishing you a great day, year, century(just joking) - HAPPY BIRTHDAY!! Hope you have a great day.
Happy birthday oldie!! We'll be pumping the music up tonight just so you can hear it.
Happy birthday!! Wishing you all the best on your special day.

Source: http://www.greeting-card-messages.com/what-to-write-in-a-birthday-card.php

Monday 18 February 2013

New Economic Thinking

New economic thinking for me involves the study of economic phenomena from a perspective which sees economic systems as being non-linear and dynamic. This approach is new because it models the interactions among agents in a more complex and realistic way than in much of the standard economics. The complexity approach enables us to gain an alternative understanding of how aggregate-level properties emerge from micro-level behaviours.

In my own research, on cooperation in agricultural collectives, researches have argued that households in collectives tend to shirk collectively because shirking is a rational choice for individual households, and as a consequence mutual shirking (i.e. non-cooperation) results in Nash equilibrium. Such logic is widely used to explain the failure of agricultural collectivization. However, this argument fails to explain the existence of successful agricultural collectives, low efficient agricultural collectives (e.g. People’s communes in China) that are sustained for long periods of time and the emergence of private farms (e.g. household responsibility system in China) from strict collectives. I believe this is because using standard economics approach to model agricultural collectives it is not possible to model all the non-linear dynamics that can be found in real agricultural collectives. A complexity approach makes it possible to model the complex interactions between households, and between households and government. It is also possible to include aggregate-level features, like social cognition and trust that emerge as a consequence of the long-term interactions in collectives, which can impact on individual level decision making processes. These interactions shape and reshape the way households behave in various ways and at different times, and being able to include them in economic models will better enable us to understand economic phenomena.

In order to model economic phenomena as complex and non-linear systems it is possible to use agent-based simulation, which is a more flexible means of modelling than equation-based modelling. Using agent-based models it is possible to create heterogeneous agents (e.g. households, collectives) that have multiple attributes (e.g. marginal productivity of effort) and preferences (e.g. preference for risk), as well as being able to conduct bottom-up analysis, test deviations from rational choice theory, and include multiple ideas from across the social sciences.

Complexity economics is able to improve economic thinking in a number of different aspects. The demand for new economic thinking comes from a number of arenas.

The research objects, economic phenomena, economics faces are complex. The complexity keeps growing as the increase of communication and interaction amongst economic actors. The fact requires economist to improve the way they deal with the complexity in economic systems.
The public, as the final consumer of economic analysis, have been let down by poor economic predictions. Much of the public has little faith in the ability of economists to provide accurate information about the economy since the 2008 financial crisis. It may be possible to regains their trust by Taking the complexity of economic systems into consideration is necessary to improve economist’s work.
Complexity economics offers a new paradigm of examining economic phenomena. This paradigm, different from reductionism that standard economics applies, emphasizes non-linear dynamics of economic systems, and as a consequence deal with economics phenomena in a more realistic way. By combining with modern (computational) analysing tools, complexity economics is expected to compensate several disadvantages of standard economic research both theoretically and methodologically.

It’s worth mentioning that I believe complexity economics is complementary, rather than substitutive of, standard economics. Each of them bears its own strengths and weaknesses. For example, complexity economics is be able treat phenomena more realistically, but it is difficult to find a rule of modelling to follow, which can confuse researchers. Standard economics can present its ideas through clear logical deduction (with the aid of mathematic formulas), but it relies too much on strong assumptions, which undermines its realizability. Therefore, it is best to cooperate rather than compete with each other. This is especially essential for complexity economics, which at its stage of coming into maturity.

Friday 15 February 2013

Reinforcement Learning Overview

I. What is RL (Reinforcement Learning)?

One important branch of computer science is AI (Artificial Intelligence). Machine learning is a subcategory of AI that becomes hot research area recently.

Machine learning in general can be classified into three categories:
1) Supervised learning (SL). These are learning in which you know the input as well as the output.
2) Unsupervised learning (USL). These are learning in which you know the input but not the output.
3) Reinforcement learning (RL). This kind of learning falls between the first two categories, in which you have the input, but not output, instead you have "critic" -- whether the classifier's output is correct or wrong. RL is often the issue of an agent acting in an environment that tries to achieve the best performance over time by trial and error.

The following is a standard model of RL [3, p368]:

In this model the agent in the environment chooses an action a_i, obtains reward r_i, and switch from state s_i to state s_i+1. The goal is to maximize long term reward, where γ is called the discounting factor.

II. Central features and issues of RL

RL dates back to the 1960's, originated from the research of dynamic programming and Markov decision processes. Monte Carlo method is another source of RL methods, learning from stochastic sampling of the sample space. A third group of method that is specific to RL is the Temporal Difference method (TD(lambda)), which combines the merits of dynamic programming and Monte Carlo method, developed in the 1980's mostly by the work of Button and Sarto etc.

One simplest RL problem is the bandit problem. One important RL algorithm is the Q-learning algorithm introduced in the next section.

Models of optimal behavior in RL can be classfied into 1) Finite-horizon model, 2) Infinite-horizon discounted model, and 3) Average-reward model. [1]

To measure learning performance, criteria include 1) eventual convergence to optimal, 2) speed of convergence to (near-)optimality, and 3) regret, which is the expected reward gained after executing the RL algorithm. [1]

The three main categories of RL algorithms are: 1) dynamic programming, 2) Monte Carlo methods, and 3) temporal difference methods. [2]

Ad-hoc strategies used in balancing RL exploration/exploitation include greedy strategies, randomized strategies (e.g. Boltzmann exploration), interval-based techniques and more. [1]

RL algorithms can also be classified into model-free methods and model-based methods. Model-free methods include Monte carol methods and temporal difference methods. Model-based methods include dynamic programming, certainty equivalent methods, Dyna, Prioritized sweeping/queue-dyna, RTDP (Real-Time Dynamic Programming), the Plexus planning system etc. More of these are briefly mentioned in [1].

Some of the central issues of RL are:

Exploration v.s. exploitation

This is illustrated in the bandit problem, in which a bandit machine has several levers with different payoff values. The bandit machine player is given a fixed number of chances to pull the levers. He needs to balance the number of trials used to find the lever with the best payoff, and the number of trials used to pull this lever only.

Life-long learning

The learning is real-time and continues through the entire life of the agent. The agent learns and acts simultaneously This kind of life-long learning is also called "online learning".

Immediate v.s. delayed reward

A RL agent needs to maximize the expected long-term reward. To achieve this goal the agent needs to consider both immediate reward and delayed reward, and try not to be stuck in a local minimum.

Generalization over input and action

In RL where a model-free method is used to find the strategy (e.g., in Q-learning), a problem is how to apply the learned knowledge into the unknown world. Model-based methods are better in this situation, but need enough prior knowledge about the environment, which may be unrealistic, and the computation burden is cursed by the dimension of the environment. Model-free methods, on the other hand, requires no prior knowledge, but makes inefficient use of the learned knowledge, thus requires much more experience to learn, and cannot generalize well.

Partially observable environments

The real world may not allow the agent to have a full and accurate perception of the environment, thus often partial information are used to guide the behavior of the agent.

Scalability

So far available RL algorithms all lack a way of scale up from toy applications to real world applications.

Principle v.s. field knowledge

This is the general problem faced by AI: a general problem solver based on the first principle does not exist. Different algorithms are need to solve different problems. More over, field knowledge are often beneficial and necessary to be added to significantly improve the performance of the solution.

III. Q-learning

The Q-learning algorithm, introduced by Watkins in 1989, is rooted in dynamic programming, and is a special case of TD(lambda) when lambda = 0. It handles discounted infinite-horizon MDP (Markov Decision Process), is easy to implement, is exploration insensitive, and is so far one of the most popular and seems to be the most effective model-free algorithm for learning from delayed reinforcement. However, it does not address scaling problem, and may converge quite slowly.

The Q-learning rule is [3, p373]:

where: s - current state, s' - next state, a - action, a' - action of the next state, r - immediate reward, α - learning rate, γ - discount factor, Q(s,a) - expected discounted reinforcement of taking action a in state s. <s, a, r, s'> is an experience tuple.

The Q-learning algorithm is:For each s, a, initialize table entry Q(s,a) <- 0 Observe current state s Do forever: Select an action a and execute it Receive immediate reward r Observe the new state s' Update the table entry for Q(s, a) as follows: Q (s, a) = Q(s, a) + α [ r + γ max Q (s', a') - Q (s, a)] s <- s'.

The converge criteria are:

The system is a deterministic MDP.
The immediate reward values are bounded by some constant.
The agent visits every possible state-action pair infinitely often.

IV. Applications of RL

Some examples of the applications of RL are in game playing, robotics, elevator control, network routing and finance.

The TD-gammon is the application of TD(lambda) algorithm to backgammon. It achieved the competence level of the best human players. However, some people argue that its designer was already an accomplished backgrammon programmer before incorporating TD(lambda) algorithm into it, and thus borrowed a lot of experiences from the prior programming knowledge of the game. That said, similar success have not been found in any other games like chess or go. Another attempt that claims very good performance was a trial on Tetris [16].

For robotics, there are many applications and experiments already done in the past.

V. Literature

At this time the most important textbook in RL is [2] written by Sutton and Barto in 1998. [3] is a popular textbook used for machine learning, which devotes chapter 13 to RL. [1] gives a brief introduction to major topics in RL but lacks detail. [4] discusses the relationship between RL and dynamic programming. A lot of material about RL are available from Sutton's homepage [5, 6, 12] and the RLAI lab [8] of Alberta University, which is resided by Sutton. The value iteration and policy iteration methods used in RL can be traced back to the work of Howard [19].

References:

Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore. Reinforcement Learning: A Survey. (1996)
Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. (1998)
Machine Learning. Tom M. Mitchell. (1997)
Barto, A., Bradtke, S., Singh, S. Learning to act using real-time dynamic programming. Artificial Intelligence, Special volume: Computational research on interaction and agency. 72(1), 81-138. (1995)
Homepage of Richard S. Sutton.
Richard S. Sutton. 499/699 courses on Reinforcement Learning. University of Alberta, Spring 2006.
Reinforcement learning - good source of RL materials, readings online.
RL and AI - RL community.
RL research reporsitory at UM - a centralized resource for research on RL.
RL introduction warehouse
RL using NN, with applications to motor control - A PhD thesis (2002, French).
RL FAQ - by Sutton, initiated 8/13/2001, last updated on 2/4/2004.
RL and AI research team - iCore, Sutton.
RL research problems - 1) scaling up, 2) partially-observable MDP.
Application of RL to dialogue strategy selection in a spoken dialogue system for email - 2000
RL tetris example - Seems few application of RL exist besides backgammon, here's a try with tetris. Result seems to be good. 1998.
Q-learning by examples - numeric example, tower of hanoi, using matlab, Excel etc.
RL course website - Utrecht University, 2006 Spring.
Dynamic Programming and Markov Processes. Ronald A. Howard. 1960.

Source: http://www2.hawaii.edu/~chenx/ics699rl/grid/rl.html#abstract

Pages