Saturday 13 September 2014

Dynamic Programming - Game Value of Rolling N Dices with S Sides Until Rollong Two Same Value in Row

1. Game Description:
Dice: S sides, 1, 2, 3, ..., S
Roll N dices,
    - If the sum of N dices is not equal to last rolling, then the reward is sum together
    - If the sum of N dices is equal to last rolling, then the reward is 0.
Calculate the game value. Here is a concrete example.

2. Analysis
Suppose that on the current state, the accumulated rewards as R and current dice sum is K represent the current state as V(RK). Now what is the strategy to stop rolling.

The Game Strategy: stop rolling and keep the current reward if and only if the next rolling will not increase the reward statistically.

Given N dices with S sides, where each side of the dice is from 1, 2, ..., S, then the possible value of SUM K is [NN*S]. Let's use Qi to represent the total combination of SUM Ki given N dices and S sides and Qi can be computed as Dynamic programming - Sum K of N dices with S sides, where Qi should meet this constraint,
        Sigma(Qi) = Pow(SN), where i within [NN*S].
Then the probability Pi of rolling Ki is Qi/Pow(SN) : Pi = Qi/Pow(SN).

According to the strategy, it can be represented mathematically as
        V(RKi) >= E(V(V(RKi), Kj)), where
                E: represents the expected value given the fact of V(RKi) and rolling again.
According to the game:
        If Ki = Kj, then the reward  is 0,
        If Ki != Kj, then the reward is V(RKi) + Kj
Hence E(V(V(RKi), Kj) can be computed as
        Pn*(V(RKi) + Kn)) + Pn+1*(V(RKi) + Kn+1)) + ... + Pi-1(V(RKi) + Ki-1)) +
        Pi*0 + Pi+1(V(RKi) + Ki+1)) + ... + Pn*s *(V(RKi) + Kn*s))
        = Sigma(Pj*(V(R,Ki) + Kj)) - Pi*(V(R, Ki) + Ki)) , where
                j belongs to [NN*S]
According to the strategy, now we should stop rolling if
        V(RKi) >= E(V(V(RKi), Kj))    ==>
        V(R, Ki) >= Sigma(Pj*(V(RKi) + Kj)) - Pi*(V(RKi) + Ki))
        V(RKi) >= Sigma(Pj*V(RKi)) + Sigma(Pj*Kj) - Pi(V(RKi) + Ki)), where
                Sigma(Pj*V(RKi)) = V(RKi) * Sigma(Pj) = V(RKi)
                Sigma(Pj*Kj) = E(Kj), expected value of SUM of N dices with S sides
        V(RKi) >= V(RKi) + Sigma(Pj*Kj) - Pi*V(RKi) - Pi*Ki
        Pi*V(RKi) >= Sigma(Pj*Kj) - Pi*Ki, where
                Pi Qi/Pow(SN) > 0, both Qi > 0 and Pow(SN) > 0
        Qi*V(RKi) >= Pow(SN)*(Sigma(Pj*Kj) - Pi*Ki)
        Qi*V(RKi) >= Pow(SN)*(Sigma(Qj/Pow(SN)*Kj) - Qi/Pow(SN)*Ki)
        Qi*V(RKi) >= Sigma(Qj*Kj) - Qi*Ki
        V(RKi) >= Sigma(Qj*Kj)/Qi - Ki

Let's assume EE = Sigma(Qj*Kj), which is a constant value given N dices with S sides. Then the above constraint tells us that at the current state V(RKi), if R is not less than EE/Qi - Ki, then the player should stop rolling and keep the reward.

Calculate the game value:
        GV = Sigma(Pi*V(0, Ki)), where
                i: [NN*S]
                V(RKi) = R, if V(RKi) >= Sigma(Qj*Kj)/Qi - Ki, else
                            = V(RKi) + E(Kj) - Pi(V(RKi) + Ki))

The strategy value of this concrete example:
Let's take a look at a concrete case as N = 2 and S = 6. Then
        K = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
        Q = {1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1}
        Pow(SN) = Pow(6, 2) = 6*6 = 36, where
                Sigma(Q) = Pow(SN) = 36
        P = {1/36, 2/36, 3/36, 4/36, 5/36, 6/36, 5/36, 4/36, 3/36, 2/36, 1/36}, where
                Sigma(P) = 1
        EE = 252
Then now calculate,
        V(R, 2) >= 252/1 - 2 = 250
        V(R, 3) >= 252/2 - 3 = 123
        V(R, 4) >= 252/3 - 4 = 80
        V(R, 5) >= 252/4 - 5 = 58
        V(R, 6) >= 252/5 - 6 = 44.4
        V(R, 7) >= 252/6 - 7 = 35
        V(R, 8) >= 252/5 - 8 = 42.4
        V(R, 9) >= 252/4 - 9 = 54
        V(R, 10) >= 252/3 - 10 = 74
        V(R, 11) >= 252/2 - 11 = 115
        V(R, 12) >= 252/1 - 12 = 240
   
3. C++ Implementation

// ********************************************************************************
#include <iostream>
#include <math.h>
#include <numeric>
#include <time.h>
#include <unordered_map>
#include <utility>
#include <vector>

// In blue color, the implementation of 
// Dynamic programming - Sum K of N dices with S sides via hash map 
struct DiceKey {
    size_t nDice;
    size_t curSum;
    bool operator()(const DiceKey& right) const {
        return nDice == right.nDice && curSum == right.curSum;
    }
};

struct DiceKey_Hash {
    size_t operator()(const DiceKey& key) const {
        return std::hash<size_t>()(key.nDice) ^ std::hash<size_t>()(key.curSum);
    }

    bool operator()(const DiceKey& k1, const DiceKey& k2) const {
        return k1.operator()(k2);
    }
};
typedef std::unordered_map<DiceKey, size_t, DiceKey_Hash, DiceKey_Hash> DiceCacheMap;

size_t SumOfDiceAlg(long s, long n, long k, DiceCacheMap &dcm)
{
    if (k < n || k >(n*s)) {
        // no solution found in this case
        return 0;
    }
    if (n == 1 && k > 0 && k <= s) {
        // found a solution here
        return 1;
    }

    const DiceKey key{ n, k };

    if (dcm.find(key) != dcm.end()) {
        return dcm[key];
    }

    // this loop is the sub-problem illustration
    //  f(S, N, K) = sigma(f(S, N-1, K-Si))
    size_t count = 0;
    for (long j = 1; j <= s; ++j) {
        count += SumOfDiceAlg(s, n - 1, k - j, dcm);
    }

    dcm.insert(std::make_pair(key, count));

    return count;
}

size_t SumOfDice(long s, long n, long k)
{
    DiceCacheMap dcm;

    size_t count = SumOfDiceAlg(s, n, k, dcm);

    return count;
}

// Calculate "Qi" and "Pi"
void CalculateQiAndPiOfSums(size_t s,
    size_t n,
    std::vector<size_t>& counts,
    std::vector<double>& probablity)
{
    if (s == 0 || n == 0) {
        return;
    }

    counts.clear();
    counts.reserve(s*n - n + 1);
    probablity.clear();
    probablity.reserve(s*n - n + 1);

    double allCombinations = pow((double)s, (double)n);
    for (size_t count, sum = n; sum <= n*s; ++sum) {
        count = SumOfDice(s, n, sum);
        counts.push_back(count);
        probablity.push_back((double)count / allCombinations);
    }
}

// Calculate "EE"
size_t CalculateSigmaQiKi(size_t sides,
    size_t nDices,
    const std::vector<size_t>& counts)
{
    size_t sigma = 0;
    for (size_t sum = nDices; sum < sides*nDices + 1; ++sum) {
        sigma += sum * counts[sum - nDices];
    }

    return sigma;
}

// Calculate "Strategy" and "Pi"
void CalculateStrategyAndPiOfSums(size_t sides,
    size_t nDices,
    std::vector<double>& strategyVals,
    std::vector<double>& probablity)
{
    std::vector<size_t> counts;
    CalculateQiAndPiOfSums(sides,
        nDices,
        counts,
        probablity);

    size_t sigma = CalculateSigmaQiKi(sides,
        nDices,
        counts);

    strategyVals.clear();
    strategyVals.reserve(counts.size());
    double strategyVal;
    for (size_t indx = 0, sum = nDices; indx < counts.size(); ++indx, ++sum) {
        strategyVal = (double)sigma / (double)counts[indx] - sum;
        strategyVals.push_back(strategyVal);
    }
}

struct CurrentState{
    double reward;
    size_t curSum;
    CurrentState(double r, size_t s)
        : reward(r), curSum(s)
    {}
    bool operator()(const CurrentState& right) const {
        return reward == right.reward && curSum == right.curSum;
    }
};

struct CurrentState_HASH{
    size_t operator()(const CurrentState& key) const {
        return std::hash<double>()(key.reward) ^ std::hash<size_t>()(key.curSum);
    }

    bool operator()(const CurrentState& k1, const CurrentState& k2) const {
        return k1.operator()(k2);
    }
};

typedef std::unordered_map<CurrentState, double, CurrentState_HASH, CurrentState_HASH> CS_CACHE;

double DiceGameStrategy(size_t sides,
    size_t nDices,
    const std::vector<double>& strategyVals,
    const std::vector<double>& probablity,
    double reward,
    size_t curSum,
    CS_CACHE& cache)
{
    if (reward >= strategyVals[curSum - nDices]) {
        return reward;
    }

    const CurrentState key{ reward, curSum };
    {
        CS_CACHE::const_iterator cachedIter = cache.find(key);
        if (cachedIter != cache.end()) {
            return cachedIter->second;
        }
    }

    double expectedReward = 0;
    double p;
    for (size_t sum = nDices; sum < sides*nDices + 1; ++sum) {
        if (sum != curSum) {
            p = probablity[sum - nDices];
            expectedReward += p* DiceGameStrategy(sides,
                nDices,
                strategyVals,
                probablity,
                reward + sum,
                sum,
                cache);
        }
    }

    cache.insert(std::make_pair(key, expectedReward));

    return expectedReward;
}

// Calculate "GV"
double DiceGame(size_t sides, size_t nDices)
{
    std::vector<double> strategyVals;
    std::vector<double> probablity;

    CalculateStrategyAndPiOfSums(sides,
        nDices,
        strategyVals,
        probablity);

    CS_CACHE cache;
    double gameValue = 0;
    // std::vector<double> gameValues;
    for (size_t sum = nDices; sum < (sides * nDices + 1); ++sum) {
        double val = DiceGameStrategy(sides,
            nDices,
            strategyVals,
            probablity,
            0,
            sum,
            cache);
        gameValue += probablity[sum - nDices] * val;
        //gameValues.push_back(val);
    }

    return gameValue;
}

// test
int _tmain(int argc, _TCHAR* argv[])
{
    DiceGame(6, 2); // gameValue:    22.224974509843324
    DiceGame(6, 3); // gameValue:    40.471468665548528
    DiceGame(6, 4); // gameValue    61.969998311984483

    return 0;
}
//********************************************************************************

4. Further Discussion
I believe that one of key deciders of the game value is the game strategy. Currently the game strategy is only to keep rolling if the next rolling is to improve the reward statistically, otherwise stop rolling and keep the reward. This might not be the optimal strategy, because I have not been able to prove it is optimal, or the global optima. The question is if the short-run setback on the reward is possible to bring the long-run optimal.

Bibliography:
[1] http://cpluspluslearning-petert.blogspot.co.uk/2014/04/dynamic-programming-sum-k-of-dice-n.html
[2] Matthew M. Conroy, "A Collection of Dice Problems", Version June 15, 2012
[3] Donald E. Knuth "The Art of Computer Programming", Volume 1, Fundamental Algorithms, Third Edition
[4] Donald E. Knuth "The Art of Computer Programming", Volume 2, Seminumerical Algorithms, Third Edition
[5] http://math.stackexchange.com/questions/302472/whats-the-optimal-strategy-of-this-dice-game

No comments:

Post a Comment