Thursday, 15 September 2016

Loading and Manipulating Historical Data From .csv Files

In my last post I said I was going to look at data wrangling my data, and this post outlines what I have done since then.

My problem was that I have numerous csv files containing historical data with different date formats and frequency, e.g. tick level and hourly and daily OHLC, and in the past I have always struggled with this. However, I have finally found a solution using the R quantmod package, which makes it easy to change data into a lower frequency. It took me some time to finally get what I wanted but the code box below shows the relevant R code to convert hourly OHLC, contained in one .csv file, to daily OHLC which is then written to a new .csv file.
library("quantmod", lib.loc="~/R/x86_64-pc-linux-gnu-library/3.3")
price_data = read.csv( "path/to/file.csv" , header = FALSE )
price_data = xts( price_data[,2:6] , order.by = as.Date.POSIXlt( strptime( price_data[,1] , format = "%d/%m/%y %H:%M" , tz = "" ) ) )
price_data_daily = to.daily( price_data , drop.time = TRUE )
write.zoo( price_data_daily , file = "path/to/new/file.csv" , sep = "," , row.names = FALSE , col.names = FALSE )
To finally achieve such a small snippet of working code I can't believe how much time I had to spend reading documentation and looking online.

This next code box shows Octave code to load the above written .csv file into Octave
fid = fopen( 'path/to/file' , 'rt' ) ;
data = textscan( fid , '%s %f %f %f %f' , 'Delimiter' , ',' , 'CollectOutput', 1 ) ;
fclose( fid ) ;
eurusd = [ datenum( data{1} , 'yyyy-mm-dd' ) data{2} ] ;
clear data fid
Hopefully, in both cases, manipulating the format strings "%d/%m/%y %H:%M" and 'yyyy-mm-dd' in these two respective code snippets will save you the hours I spent.

Useful links that helped me are:

Saturday, 3 September 2016

Possible Addition of NARX Network to Conditional Restricted Boltzmann Machine

It has been over three months since my last post, due to working away from home for some of the summer, a summer holiday and moving home. However, during this time I have continued with my online reading and some new thinking about my conditional restricted boltzmann machine based trading system has developed, namely the use of a nonlinear autoregressive exogenous model in the bottom layer gaussian units of the CRBM. Some links to reading on the same are shown below.
The exogenous time series I am thinking of using, at least for the major forex pairs and perhaps precious metals, oil and US treasuries, is a currency strength indicator based on the US dollar. In order to create the currency strength indicator I will have to delve into some data wrangling with the historical forex data I have, and this will be the subject of my next post.

    Monday, 16 May 2016

    Giving Up on Recursive Sine Formula for Period Calculation

    I have spent the last few weeks trying to get my recursive sine wave formula for period calculations to work, but try as I might I can only get it to do so under ideal theoretical conditions. Once any significant noise, trend or combination thereof is introduced the calculations explode and give meaningless results. In light of this, I am no longer going to continue this work.

    Apart from the above work I have also been doing my usual online research and have come across John Ehler's autocorrelation periodogram for period measurement, and below is my Octave C++ .oct implementation of it.
    DEFUN_DLD ( autocorrelation_periodogram, args, nargout,
    "-*- texinfo -*-\n\
    @deftypefn {Function File} {} autocorrelation_periodogram (@var{input_vector})\n\
    This function takes an input vector ( price ) and outputs the dominant cycle period,\n\
    calculated from the autocorrelation periodogram spectrum.\n\
    @end deftypefn" )
    
    {
    octave_value_list retval_list ;
    int nargin = args.length () ;
    
    // check the input arguments
    if ( nargin != 1 ) // there must be a price vector only
       {
       error ("Invalid arguments. Input is a price vector only.") ;
       return retval_list ;
       }
    
    if ( args(0).length () < 4 )
       {
       error ("Invalid argument length. Input is a price vector of length >= 4.") ;
       return retval_list ;
       }
    
    if ( error_state )
       {
       error ("Invalid argument. Input is a price vector of length >= 4.") ;
       return retval_list ;
       }
    // end of input checking 
    
    ColumnVector input = args(0).column_vector_value () ;
    ColumnVector hp = args(0).column_vector_value () ; hp.fill( 0.0 ) ;
    ColumnVector smooth = args(0).column_vector_value () ; smooth.fill( 0.0 ) ;
    ColumnVector corr ( 49 ) ; corr.fill( 0.0 ) ;
    ColumnVector cosine_part ( 49 ) ; cosine_part.fill( 0.0 ) ;
    ColumnVector sine_part ( 49 ) ; sine_part.fill( 0.0 ) ;
    ColumnVector sq_sum ( 49 ) ; sq_sum.fill( 0.0 ) ;
    ColumnVector R1 ( 49 ) ; R1.fill( 0.0 ) ;
    ColumnVector R2 ( 49 ) ; R2.fill( 0.0 ) ;  
    ColumnVector pwr ( 49 ) ; pwr.fill( 0.0 ) ;
    ColumnVector dominant_cycle = args(0).column_vector_value () ; dominant_cycle.fill( 0.0 ) ;  
       
    double avglength = 3.0 ;
    double M ;
    double X ; double Y ;   
    double Sx ; double Sy ; double Sxx ; double Syy ; double Sxy ;
    double denom ;
    double max_pwr = 0.0 ;
    double Spx ; double Sp ;  
    
    // variables for highpass filter, hard coded for a high cutoff period of 48 bars and low cutoff of 10 bars
    double high_cutoff = 48.0 ; double low_cutoff = 10.0 ;  
    double alpha_1 = ( cos( 0.707 * 2.0 * PI / high_cutoff ) + sin( 0.707 * 2.0 * PI / high_cutoff ) - 1.0 ) / cos( 0.707 * 2.0 * PI / high_cutoff ) ;   
    double beta_1 = ( 1.0 - alpha_1 / 2.0 ) * ( 1.0 - alpha_1 / 2.0 ) ;
    double beta_2 = 2.0 * ( 1.0 - alpha_1 ) ;
    double beta_3 = ( 1.0 - alpha_1 ) * ( 1.0 - alpha_1 ) ;
     
    // variables for super smoother
    double a1 = exp( -1.414 * PI / low_cutoff ) ;
    double b1 = 2.0 * a1 * cos( 1.414 * PI / low_cutoff ) ;
    double c2 = b1 ;
    double c3 = -a1 * a1 ;
    double c1 = 1.0 - c2 - c3 ;
      
    // calculate the automatic gain control factor, K
    double K = 0.0 ;
    double accSlope = -1.5 ; //acceptableSlope = 1.5 dB
    double halfLC = low_cutoff / 2.0 ;
    double halfHC = high_cutoff / 2.0 ;
    double ratio = pow( 10 , accSlope / 20.0 ) ;
      
     if( halfHC - halfLC > 0.0 )
        {
      K = pow( ratio , 1.0 / ( halfHC - halfLC ) ) ;
        }
    
    // loop to initialise hp and smooth
    for ( octave_idx_type ii ( 2 ) ; ii < 49 ; ii++ ) // main loop
        {  
        // highpass filter components whose periods are < 48 bars
        hp(ii) = beta_1 * ( input(ii) - 2.0 * input(ii-1) + input(ii-2) ) + beta_2 * hp(ii-1) - beta_3 * hp(ii-2) ;
        
        // smooth with a super smoother filter
        smooth(ii) = c1 * ( hp(ii) + hp(ii-1) ) / 2.0 + c2 * smooth(ii-1) + c3 * smooth(ii-2) ;
        } // end of initial loop
    
    for ( octave_idx_type ii ( 49 ) ; ii < args(0).length () ; ii++ ) // main loop
        {  
        // highpass filter components whose periods are < 48 bars
        hp(ii) = beta_1 * ( input(ii) - 2.0 * input(ii-1) + input(ii-2) ) + beta_2 * hp(ii-1) - beta_3 * hp(ii-2) ;
        
        // smooth with a super smoother filter
        smooth(ii) = c1 * ( hp(ii) + hp(ii-1) ) / 2.0 + c2 * smooth(ii-1) + c3 * smooth(ii-2) ;
          
          // Pearson correlation for each value of lag
          for ( octave_idx_type lag (0) ; lag <= high_cutoff ; lag++ ) 
              {
              // set the averaging length as M
              M = avglength ;
              if ( avglength == 0) 
                 {
                  M = double( lag ) ; 
                 }
                 
              Sx = 0.0 ; Sy = 0.0 ; Sxx = 0.0 ; Syy = 0.0 ; Sxy = 0.0 ;
                
                for ( octave_idx_type count (0) ; count < M - 1 ; count++ )
                    {
                     X = smooth(ii-count) ; Y = smooth(ii-(lag+count)) ; 
                     Sx += X ; 
                     Sy += Y ;
                     Sxx += X * X ;
                     Sxy += X * Y ;
                     Syy += Y * Y ;
                    }
                 
                denom = ( M * Sxx - Sx * Sx ) * ( M * Syy - Sy * Sy ) ;    
                if ( denom > 0.0 )
                   {
                    corr(lag) = ( M * Sxy - Sx * Sy ) / sqrt( denom ) ;  
                   }    
                
              } // end of Pearson correlation loop        
    /*
        The DFT is accomplished by correlating the autocorrelation at each value of lag with the cosine and sine of each period of interest. 
        The sum of the squares of each of these values represents the relative power at each period.
    */                  
          for ( octave_idx_type period (low_cutoff) ; period <= high_cutoff ; period++ )
              {
               cosine_part( period ) = 0.0 ; sine_part( period ) = 0.0 ;
                
                for ( octave_idx_type N (3) ; N <= high_cutoff ; N++ )
                    {
                     cosine_part( period ) += corr( N ) * cos( 2.0 * PI * double( N ) / double( period ) ) ;  
                     sine_part( period ) += corr( N ) * sin( 2.0 * PI * double( N ) / double( period ) ) ; 
                    } // end of N loop
                    
                sq_sum( period ) = cosine_part( period ) * cosine_part( period ) + sine_part( period ) * sine_part( period ) ;
                    
              } // end of first period loop 
     
          // EMA is used to smooth the power measurement at each period          
          for ( octave_idx_type period (low_cutoff) ; period <= high_cutoff ; period++ )
              {
               R2( period ) = R1( period ) ;
               R1( period ) = 0.2 * sq_sum( period ) * sq_sum( period ) + 0.8 * R2( period ) ; 
              } // end of second period loop
          
        // Find maximum power level for normalisation      
        max_pwr = 0.0 ;      
              
          for ( octave_idx_type period (low_cutoff) ; period <= high_cutoff ; period++ )
              {
               if ( R1( period ) > max_pwr )
                  {
                    max_pwr = K * R1( period ) ;
                  }
              } // end of third period loop
        
        // normalisation of power      
          for ( octave_idx_type period (low_cutoff) ; period <= high_cutoff ; period++ )
              {
               pwr( period ) = R1( period ) / max_pwr ;
              } // end of fourth period loop 
          
        // compute the dominant cycle using the centre of gravity of the spectrum
        Spx = 0.0 ; Sp = 0.0 ;
        
          for ( octave_idx_type period (low_cutoff) ; period <= high_cutoff ; period++ )
              {
               if ( pwr( period ) >= 0.5 )
                  {
                   Spx += double( period ) * pwr( period ) ;
                   Sp += pwr( period ) ;  
                  }
              } // end of fifth period loop
        
        if ( Sp != 0.0 )
           {
            dominant_cycle(ii) = Spx / Sp ;
           }      
              
        } // end of main loop
        
    retval_list( 0 ) = dominant_cycle ;
    
    return retval_list ;
    
    } // end of function 
    When applied directly to a theoretical but noisy sine wave series with a trend I find that this autocorrelation method performs better than my current period measurement algo, but on detrended data it is not as good. Since it is trivial to detrend price data, for now I am going to stick with my current method.

    Monday, 25 April 2016

    Recursive Sine Wave Formula for Period Calculation

    Since my last post I have successfully managed to incorporate the deepmat toolbox into my code, so now my RBM pre-training uses Parallel tempering and adaptive learning rates, which is all well and good. The only draw back at the moment is the training time - it takes approximately 3 to 4 minutes per bar to train on a minimal set of 2 features because the toolbox is written in Octave code and uses for loops instead of using vectorisation. Obviously this is something that I would like to optimise, but for the nearest future I now want to concentrate on feature engineering and create a useful set of features for my CRBM.

    In the past I have blogged about frequency/period measurement ( e.g. here and here ) and in this post I would like to talk about a possible new way to calculate the dominant cycle period in the data. In a Stackoverflow forum post some time ago I was alerted to a recursive sinewave generator, with code, that shows how to forward generate a sine wave using just the last few values of a sine wave. It struck me that the code can be used, given the last three values of a sine wave, to calculate the period of the sine wave using simple linear regression, and in the code box below I give some Octave code which shows the basic idea.
    clear all
    
    % sine wave periods
    period = input( 'Enter period: ' )
    period2 = input( 'Enter period2: ' )
    
    true_periods = [ ones( 6*period , 1 ) .* period ; ones( 3*period2 , 1 ) .* period2 ; ones( 3*period , 1 ) .* period ] ;
    
    % create sine wave and add some noise
    price = awgn( 1 .* ( 2 .+ [ sinewave( 6*period , period )' ; sinewave( 3*period2 , period2 )' ; sinewave( 3*period , period )' ] ) , 100 ) ;
    
    % extract the signal
    hp = highpass_filter_basic( price ) ;
    
    % smooth the signal
    smooth = smooth_2_5( hp ) ;
    
    Y = smooth .+ shift( smooth , 2 ) ;
    X = shift( smooth , 1 ) ;
    
    calculated_periods = zeros( size ( price ) ) ;
    
    % do the linear regression
    for ii = 50 : size( price , 1 )
    calculated_periods(ii) = ( ( X( ii-4:ii , : )' * X( ii-4:ii , : ) ) \ X( ii-4:ii , : )' ) * Y( ii-4:ii , : ) ;
    end
    
    % get the periods from regression calculations
    calculated_periods = real( sqrt( ( 8 .- 4 .* calculated_periods ) ./ ( calculated_periods .+ 2 ) ) ) ;
    calculated_periods = 360 ./ ( ( calculated_periods .* 180 ) ./ pi ) ;
    calculated_periods = ema( calculated_periods , 3 ) ;
    calculated_periods = round( calculated_periods ) ;
    
    figure(1) ; plot( price , 'b' , "linewidth" , 2 , hp , 'r' , "linewidth" , 2 , smooth , 'g' , "linewidth" , 2 ) ; legend( 'Price' , 'Highpass' , 'Highpass smooth' ) ;
    figure(2) ; plot( true_periods , 'b' , "linewidth" , 2 , calculated_periods , 'r' , "linewidth", 2 ) ; legend( 'True Periods' , 'Calculated Periods' ) ;
    The code creates a sine wave with two periods ( user defined ), does the calculations and then plots the  sine wave and the periods in figures 1 and 2 respectively. The linear regression part of the code use the most recent five bars for calculation, which could of course also be user defined. On data without added noise typical plots are :-
    which shows the underlying "price" in blue and the high pass filtered and smoothed versions in red and green and
    shows the true and measured periods. Noisy price versions of the above are :-
    and
    Theoretically it seems to work, but I would like to see if things can be improved. More in my next post.


    Thursday, 31 March 2016

    Parallel Tempering and Adaptive Learning Rates in Restricted Boltzmann Machine Learning

    It has been a while since my last post and in the intervening time I have been busy working on the code of my previous few posts.

    During the course of this I have noticed that there are some further improvements to be made in terms of robustness etc. inspired by this Master's thesis, Improved Learning Algorithms for Restricted Boltzmann Machines, by KyungHyun Cho. Using the Deepmat Toolbox code available here as a guide, I now intend to further improve my code by incorporating the concepts of Parallel Tempering and adaptive learning rates for both the RBM and CRBM training.

    More in due course.

    Wednesday, 3 February 2016

    Refactored Denoising Autoencoder Update #2

    Below is this second code update.
    %  select rolling window length to use - an optimisable parameter via pso?
    rolling_window_length = 50 ;
    batchsize = 5 ;
    
    %  how-many timesteps do we look back for directed connections - this is what we call the "order" of the model 
    n1 = 3 ; % first "gaussian" layer order, a best guess just for batchdata creation purposes
    n2 = 3 ; % second "binary" layer order, a best guess just for batchdata creation purposes
    
    %  taking into account rolling_window_length, n1, n2 and batchsize, get total lookback length
    remainder = rem( ( rolling_window_length + n1 + n2 ) , batchsize ) ;
    
    if ( remainder > 0 ) % number of training examples with lookback and orders n1 and n2 not exactly divisable by batchsize
    lookback_length = ( rolling_window_length + n1 + n2 + ( batchsize - remainder ) ) ; % increase the lookback_length
    else                 % number of training examples with lookback and orders n1 and n2 exactly divisable by batchsize
    lookback_length = ( rolling_window_length + n1 + n2 ) ;
    end
    
    %  create batchdataindex using lookback_length to index bars in the features matrix
    batchdataindex = ( ( training_point_index - ( lookback_length - 1 ) ) : 1 : training_point_index )' ;
    batchdata = features( batchdataindex , : ) ;
    
    %  now that the batchdata has been created, check it for autocorrelation in the features
    all_ar_coeff = zeros( size( batchdata , 2 ) , 1 ) ;
    
      for ii = 1 : size( batchdata , 2 )
      ar_coeffs = arburg( batchdata( : , ii ) , 10 , 'FPE' ) ;
      all_ar_coeff( ii ) = length( ar_coeffs ) - 1 ;
      end
      
    %  set order of gaussian_crbm, n1, to be equal to the average length of any autocorrelation in the data
    n1 = round( mean( all_ar_coeff ) ) ;  
    
    %  z-normalise the batchdata matrix with the mean and std of columns 
    data_mean = mean( batchdata , 1 ) ;
    data_std = std( batchdata , 1 ) ;
    batchdata = ( batchdata .- repmat( data_mean , size( batchdata , 1 ) , 1 ) ) ./ repmat( data_std , size( batchdata , 1 ) , 1 ) ; % batchdata is now z-normalised by data_mean & data_std
    
    %  create the minibatch index matrix for gaussian rbm pre-training of directed weights w
    minibatch = ( 1 : 1 : size( batchdata , 1 ) ) ; minibatch( 1 : ( size( batchdata , 1 ) - rolling_window_length ) ) = [] ;
    minibatch = minibatch( randperm( size( minibatch , 2 ) ) ) ; minibatch = reshape( minibatch , batchsize , size( minibatch , 2 ) / batchsize ) ; 
    
    % PRE-TRAINING FOR THE VISABLE TO HIDDEN AND THE VISIBLE TO VISIBLE WEIGHTS %%%%
    % First create a training set and target set for the pre-training of gaussian layer
    dAuto_Encode_targets = batchdata ; dAuto_Encode_training_data = [] ;
    % dAuto_Encode_targets = batchdata( : , 2 : end ) ; dAuto_Encode_training_data = [] ; % if bias added to raw data
      
      % loop to create the dAuto_Encode_training_data ( n1 == "order" of the gaussian layer of crbm )
      for ii = 1 : n1
      dAuto_Encode_training_data = [ dAuto_Encode_training_data shift( batchdata , ii ) ] ;
      end
    
    % now delete the first n1 rows due to circular shift induced mismatch of data and targets
    dAuto_Encode_targets( 1 : n1 , : ) = [] ; dAuto_Encode_training_data( 1 : n1 , : ) = [] ;
    
    % DO RBM PRE-TRAINING FOR THE BOTTOM UP DIRECTED WEIGHTS W %%%%%%%%%%%%%%%%%%%%%
    % use rbm trained initial weights instead of using random initialisation for weights
    % Doing this because we are not using regularisation in the autoencoder pre-training
    epochs = 10000 ;
    hidden_layer_size = 4 * size( dAuto_Encode_targets , 2 ) ;
    [ w_weights , w_weights_hid_bias , w_weights_vis_bias ] = cc_gaussian_rbm( dAuto_Encode_targets , minibatch , epochs , hidden_layer_size , 0.05 ) ;
    % keep a copy of these original w_weights
    w1 = w_weights ;
    [ A_weights , A_weights_hid_bias , A_weights_vis_bias ] = cc_gaussian_rbm( dAuto_Encode_training_data , minibatch , epochs , size( dAuto_Encode_targets , 2 ) , 0.05 ) ;
    [ B_weights , B_weights_hid_bias , B_weights_vis_bias ] = cc_gaussian_rbm( dAuto_Encode_training_data , minibatch , epochs , hidden_layer_size , 0.05 ) ;
    
    % END OF RBM PRE-TRAINING OF AUTOENCODER WEIGHTS %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    figure(1) ; surf( A_weights ) ; title( 'A Weights after RBM training' ) ;
    figure(2) ; surf( B_weights ) ; title( 'B Weights after RBM training' ) ;
    figure(3) ; surf( w_weights ) ; title( 'w Weights after RBM training' ) ;
    figure(4) ; plot( A_weights_hid_bias , 'b' , B_weights_hid_bias , 'r' , w_weights_vis_bias , 'g' ) ; title( 'Biases after RBM training' ) ; legend( 'A' , 'B' , 'w' ) ;
    
    % DO THE AUTOENCODER TRAINING %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % create weight update matrices
    A_weights_update = zeros( size( A_weights ) ) ;
    A_weights_hid_bias_update = zeros( size( A_weights_hid_bias ) ) ;
    B_weights_update = zeros( size( B_weights ) ) ;
    B_weights_hid_bias_update = zeros( size( B_weights_hid_bias ) ) ;
    w_weights_update = zeros( size( w_weights ) ) ;
    w_weights_vis_bias_update = zeros( size( w_weights_vis_bias ) ) ;
    
    % for adagrad
    historical_A = zeros( size( A_weights ) ) ;
    historical_A_hid_bias = zeros( size( A_weights_hid_bias ) ) ;
    historical_B = zeros( size( B_weights ) ) ;
    historical_B_hid_bias = zeros( size( B_weights_hid_bias ) ) ;
    historical_w = zeros( size( w_weights ) ) ;
    historical_w_vis_bias = zeros( size( w_weights_vis_bias ) ) ;
    
    % set some training parameters
    n = size( dAuto_Encode_training_data , 1 ) ; % number of training examples in dAuto_Encode_training_data
    input_layer_size = size( dAuto_Encode_training_data , 2 ) ;
    fudge_factor = 1e-6 ; % for numerical stability for adagrad
    learning_rate = 0.01 ; % will be changed to 0.001 after 50 iters through epoch loop
    mom = 0 ;            % will be changed to 0.9 after 50 iters through epoch loop
    noise = 0.5 ;
    epochs = 1000 ;
    cost = zeros( epochs , 1 ) ;
    lowest_cost = inf ;
    
      % Stochastic Gradient Descent training over dAuto_Encode_training_data 
      for iter = 1 : epochs
       
          % change momentum and learning_rate after 50 iters
          if iter == 50
          mom = 0.9 ;
          learning_rate = 0.001 ;
          end
      
          index = randperm( n ) ; % randomise the order of training examples
         
          for training_example = 1 : n
          
          % Select data for this training batch
          tmp_X = dAuto_Encode_training_data( index( training_example ) , : ) ;
          tmp_T = dAuto_Encode_targets( index( training_example ) , : ) ;
          
          % Randomly black out some of the input training data
          tmp_X( rand( size( tmp_X ) ) < noise ) = 0 ;
          
          % feedforward tmp_X through B_weights and get sigmoid e.g ret = 1.0 ./ ( 1.0 + exp(-input) )
          tmp_X_through_sigmoid = 1.0 ./ ( 1.0 .+ exp( - ( tmp_X * B_weights .+ B_weights_hid_bias ) ) ) ;
          
          % Randomly black out some of tmp_X_through_sigmoid for dropout training
          tmp_X_through_sigmoid( rand( size( tmp_X_through_sigmoid ) ) < noise ) = 0 ;
        
          % feedforward tmp_X through A_weights and add to tmp_X_through_sigmoid * w_weights for linear output layer
          final_output_layer = ( tmp_X * A_weights .+ A_weights_hid_bias ) .+ ( tmp_X_through_sigmoid * w_weights' .+ w_weights_vis_bias ) ;
        
          % now do backpropagation
          % this is the derivative of weights for the linear final_output_layer
          delta_out = ( tmp_T - final_output_layer ) ;
          
          % NOTE! gradient of sigmoid function g = sigmoid(z) .* ( 1.0 .- sigmoid(z) )
          sig_grad = tmp_X_through_sigmoid .* ( 1 .- tmp_X_through_sigmoid ) ; 
          
          % backpropagation only through the w_weights that are connected to tmp_X_through_sigmoid
          delta_hidden = ( delta_out * w_weights ) .* sig_grad ;
          
          % apply deltas from backpropagation with adagrad for the weight updates
          historical_A = historical_A .+ ( tmp_X' * delta_out ).^2 ;    
          A_weights_update = mom .* A_weights_update .+ ( learning_rate .* ( tmp_X' * delta_out ) ) ./ ( fudge_factor .+ sqrt( historical_A ) ) ;
          
          historical_A_hid_bias = historical_A_hid_bias .+ delta_out.^2 ;
          A_weights_hid_bias_update = mom .* A_weights_hid_bias_update .+ ( learning_rate .* delta_out ) ./ ( fudge_factor .+ sqrt( historical_A_hid_bias ) ) ;
          
          historical_w = historical_w .+ ( delta_out' * tmp_X_through_sigmoid ).^2 ;
          w_weights_update = mom .* w_weights_update .+ ( learning_rate .* ( delta_out' * tmp_X_through_sigmoid ) ) ./ ( fudge_factor .+ sqrt( historical_w ) ) ;
          
          historical_w_vis_bias = historical_w_vis_bias .+ delta_out.^2 ;
          w_weights_vis_bias_update = mom .* w_weights_vis_bias_update .+ ( learning_rate .* delta_out ) ./ ( fudge_factor .+ sqrt( historical_w_vis_bias ) ) ;
          
          historical_B = historical_B .+ ( tmp_X' * delta_hidden ).^2 ;
          B_weights_update = mom .* B_weights_update .+ ( learning_rate .* ( tmp_X' * delta_hidden ) ) ./ ( fudge_factor .+ sqrt( historical_B ) ) ;
          
          historical_B_hid_bias = historical_B_hid_bias .+ delta_hidden.^2 ;
          B_weights_hid_bias_update = mom .* B_weights_hid_bias_update .+ ( learning_rate .* delta_hidden ) ./ ( fudge_factor .+ sqrt( historical_B_hid_bias ) ) ;
          
          % update the weight matrices with weight_updates
          A_weights = A_weights + A_weights_update ;
          A_weights_hid_bias = A_weights_hid_bias + A_weights_hid_bias_update ;
          B_weights = B_weights + B_weights_update ;
          B_weights_hid_bias = B_weights_hid_bias + B_weights_hid_bias_update ;
          w_weights = w_weights + w_weights_update ;
          w_weights_vis_bias = w_weights_vis_bias + w_weights_vis_bias_update ;
          
          end % end of training_example loop
      
      % feedforward with this epoch's updated weights
      epoch_trained_tmp_X_through_sigmoid = 1.0 ./ ( 1.0 .+ exp( -( dAuto_Encode_training_data * B_weights .+ repmat( B_weights_hid_bias , size( dAuto_Encode_training_data , 1 ) , 1 ) ) ) ) ;
      epoch_trained_output = ( dAuto_Encode_training_data * A_weights .+ repmat( A_weights_hid_bias , size( dAuto_Encode_training_data , 1 ) , 1 ) )...
                              .+ ( epoch_trained_tmp_X_through_sigmoid * w_weights' .+ repmat( w_weights_vis_bias , size( epoch_trained_tmp_X_through_sigmoid , 1 ) , 1 ) ) ;
     
      % get sum squared error cost
      cost( iter , 1 ) = sum( sum( ( dAuto_Encode_targets .- epoch_trained_output ) .^ 2 ) ) ;
      
        % record best so far
        if cost( iter , 1 ) <= lowest_cost
           lowest_cost = cost( iter , 1 ) ;
           iter_min = iter ;
           best_A = A_weights ;
           best_B = B_weights ;
           best_w = w_weights ;
        end
      
      end % end of backpropagation epoch loop
    
    % plot weights
    figure(5) ; surf( best_A ) ; title( 'Best A Weights' ) ;
    figure(6) ; surf( best_B ) ; title( 'Best B Weights' ) ;
    figure(7) ; surf( best_w ) ; title( 'Best w Weights' ) ;
    figure(8) ; plot( A_weights_hid_bias , 'b' , B_weights_hid_bias , 'r' , w_weights_vis_bias , 'g' ) ; title( 'Biases after Autoencoder training' ) ; legend( 'A' , 'B' , 'w' ) ;
    figure(9) ; plot( cost ) ; title( 'Evolution of Autoencoder cost' ) ;
    
    % END OF CRBM WEIGHT PRE-TRAINING %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    The changes from the previous code update are a slightly different way to handle the bias units, the introduction of hidden and visible bias units from Restricted Boltzmann machine (RBM) pre-training and the introduction of an automated way to select the "order" of the Conditional Restricted Boltzmann machine (CRBM).

    The order of a CRBM is how many time steps we look back in order to model the autoregressive components. This could be decided heuristically or through cross validation but I have decided to use the Octave "arburg" function to "auto-magically" select this look back length, the idea being that the data itself informs this decision and makes the whole CRBM training algorithm adaptive to current conditions. Since the ultimate point of the CRBM will be to make predictions of future OHLC values I have chosen to use the final prediction error model selection criteria for the arburg function.

    Now that the bulk of this coding has been completed I think it would be useful to describe the proposed work flow of the various components.
    • the data and its derived inputs, such as indicators etc, are input to a Gaussian RBM as a weight initialisation step for the denoising autoencoder training. A Gaussian RBM is used because the data are real valued and not binary. This step is typical of what happens in deep learning and helps to extract meaningful features from the raw data in an unsupervised manner
    • the data and RBM initialised weights are then input to the denoising autoencoder to further model the weights and to take into account the autoregressive components of the data
    • these twice modelled weights are then used as the initial weights for the CRBM training of a Gaussian-Binary CRBM layer
    • the hidden layer of the above Gaussian-Binary CRBM is then used as data for a second Binary-Binary CRBM layer which will be stacked. The training for this second layer will follow the format above, i.e. RBM and denoising autoencoder pre-training of weights
    The next step will be for me to compile the denoising autoencoder code into an Octave C++ .oct function for speed optimisation purposes. 


    Thursday, 21 January 2016

    Refactored Denoising Autoencoder Code Update

    This code box contains updated code from my previous post. The main change is the inclusion of bias units for the directed auto-regressive weights and the visible to hidden weights. In addition there is code showing how data is pre-processed into batches/targets for the pre-training and code showing how the weight matrices are manipulated into a form suitable for my existing optimised crbm code for gaussian units.
    %  select rolling window length to use - an optimisable parameter via pso?
    rolling_window_length = 50 ;
    
    %  how-many timesteps do we look back for directed connections - this is what we call the "order" of the model 
    n1 = 3 ; % first "gaussian" layer order
    n2 = 3 ; % second "binary" layer order
    batchsize = 5 ;
    
    %  taking into account rolling_window_length, n1, n2 and batchsize, get total lookback length
    remainder = rem( ( rolling_window_length + n1 + n2 ) , batchsize ) ;
    
    if ( remainder > 0 ) % number of training examples with lookback and orders n1 and n2 not exactly divisable by batchsize
    lookback_length = ( rolling_window_length + n1 + n2 + ( batchsize - remainder ) ) ; % increase the lookback_length
    else                 % number of training examples with lookback and orders n1 and n2 exactly divisable by batchsize
    lookback_length = ( rolling_window_length + n1 + n2 ) ;
    end
    
    %  create batchdataindex using lookback_length to index bars in the features matrix
    batchdataindex = ( ( training_point_index - ( lookback_length - 1 ) ) : 1 : training_point_index )' ;
    batchdata = features( batchdataindex , : ) ;
    
    %  z-normalise the batchdata matrix with the mean and std of columns 
    data_mean = mean( batchdata , 1 ) ;
    data_std = std( batchdata , 1 ) ;
    batchdata = ( batchdata .- repmat( data_mean , size( batchdata , 1 ) , 1 ) ) ./ repmat( data_std , size( batchdata , 1 ) , 1 ) ; % batchdata is now z-normalised by data_mean & data_std
    % add bias neurons
    batchdata = [ ones( size( batchdata , 1 ) , 1 ) batchdata ] ;
    
    %  create the minibatch index matrix for gaussian rbm pre-training of directed weights w
    minibatch = ( 1 : 1 : size( batchdata , 1 ) ) ; minibatch( 1 : ( size( batchdata , 1 ) - rolling_window_length ) ) = [] ;
    minibatch = minibatch( randperm( size( minibatch , 2 ) ) ) ; minibatch = reshape( minibatch , batchsize , size( minibatch , 2 ) / batchsize ) ; 
    
    % PRE-TRAINING FOR THE VISABLE TO HIDDEN AND THE VISIBLE TO VISIBLE WEIGHTS %%%%
    % First create a training set and target set for the pre-training
    dAuto_Encode_targets = batchdata( : , 2 : end ) ; dAuto_Encode_training_data = [] ;
      
      % loop to create the dAuto_Encode_training_data ( n1 == "order" of the gaussian layer of crbm )
      for ii = 1 : n1
      dAuto_Encode_training_data = [ dAuto_Encode_training_data shift( batchdata , ii ) ] ;
      end
    
    % now delete the first n1 rows due to circular shift induced mismatch of data and targets
    dAuto_Encode_targets( 1 : n1 , : ) = [] ; dAuto_Encode_training_data( 1 : n1 , : ) = [] ;
    % add bias
    %dAuto_Encode_training_data = [ ones( size( dAuto_Encode_training_data , 1 ) , 1 ) dAuto_Encode_training_data ] ; 
    % bias units idx
    bias_idx = ( 1 : size( batchdata , 2 ) : size( dAuto_Encode_training_data , 2 ) ) ;
    
    % DO RBM PRE-TRAINING FOR THE BOTTOM UP DIRECTED WEIGHTS W %%%%%%%%%%%%%%%%%%%%%
    % use rbm trained initial weights instead of using random initialisation for weights
    % Doing this because we are not using regularisation in the autoencoder pre-training
    epochs = 10000 ;
    hidden_layer_size = 2 * size( dAuto_Encode_targets , 2 ) ;
    w_weights = gaussian_rbm( dAuto_Encode_targets , minibatch , epochs , hidden_layer_size ) ;
    % keep a copy of these original w_weights
    w1 = w_weights ;
    A_weights = gaussian_rbm( dAuto_Encode_training_data , minibatch , epochs , size( dAuto_Encode_targets , 2 ) ) ;
    B_weights = gaussian_rbm( dAuto_Encode_training_data , minibatch , epochs , hidden_layer_size ) ;
    
    % create weight update matrices
    A_weights_update = zeros( size( A_weights ) ) ;
    B_weights_update = zeros( size( B_weights ) ) ;
    w_weights_update = zeros( size( w_weights ) ) ;
    
    % for adagrad
    historical_A = zeros( size( A_weights ) ) ;
    historical_B = zeros( size( B_weights ) ) ;
    historical_w = zeros( size( w_weights ) ) ;
    
    % set some training parameters
    n = size( dAuto_Encode_training_data , 1 ) ; % number of training examples in dAuto_Encode_training_data
    input_layer_size = size( dAuto_Encode_training_data , 2 ) ;
    fudge_factor = 1e-6 ; % for numerical stability for adagrad
    learning_rate = 0.1 ; % will be changed to 0.01 after 50 iters through epoch loop
    mom = 0 ;             % will be changed to 0.9 after 50 iters through epoch loop
    noise = 0.5 ;
    epochs = 1000 ;
    cost = zeros( epochs , 1 ) ;
    lowest_cost = inf ;
    
      % Stochastic Gradient Descent training over dAuto_Encode_training_data 
      for iter = 1 : epochs
       
          % change momentum and learning_rate after 50 iters
          if iter == 50
          mom = 0.9 ;
          learning_rate = 0.01 ;
          end
      
          index = randperm( n ) ; % randomise the order of training examples
         
          for training_example = 1 : n
          
          % Select data for this training batch
          tmp_X = dAuto_Encode_training_data( index( training_example ) , : ) ;
          tmp_T = dAuto_Encode_targets( index( training_example ) , : ) ;
          
          % Randomly black out some of the input training data
          tmp_X( rand( size( tmp_X ) ) < noise ) = 0 ;
          % but keep bias units
          tmp_X( bias_idx ) = 1 ;
          
          % feedforward tmp_X through B_weights and get sigmoid e.g ret = 1.0 ./ ( 1.0 + exp(-input) )
          tmp_X_through_sigmoid = 1.0 ./ ( 1.0 .+ exp( - B_weights * tmp_X' ) ) ;
          
          % Randomly black out some of tmp_X_through_sigmoid for dropout training
          tmp_X_through_sigmoid( rand( size( tmp_X_through_sigmoid ) ) < noise ) = 0 ;
        
          % feedforward tmp_X through A_weights and add to tmp_X_through_sigmoid * w_weights for linear output layer
          final_output_layer = ( tmp_X * A_weights' ) .+ ( tmp_X_through_sigmoid' * w_weights ) ;
        
          % now do backpropagation
          % this is the derivative of weights for the linear final_output_layer
          delta_out = ( tmp_T - final_output_layer ) ;
          
          % NOTE! gradient of sigmoid function g = sigmoid(z) .* ( 1.0 .- sigmoid(z) )
          sig_grad = tmp_X_through_sigmoid .* ( 1 .- tmp_X_through_sigmoid ) ; 
          
          % backpropagation only through the w_weights that are connected to tmp_X_through_sigmoid
          delta_hidden = ( w_weights * delta_out' ) .* sig_grad ;
          
          % apply deltas from backpropagation with adagrad for the weight updates
          historical_A = historical_A .+ ( delta_out' * tmp_X ).^2 ;    
          A_weights_update = mom .* A_weights_update .+ ( learning_rate .* ( delta_out' * tmp_X ) ) ./ ( fudge_factor .+ sqrt( historical_A ) ) ;
          
          historical_w = historical_w .+ ( tmp_X_through_sigmoid * delta_out ).^2 ;
          w_weights_update = mom .* w_weights_update .+ ( learning_rate .* ( tmp_X_through_sigmoid * delta_out ) ) ./ ( fudge_factor .+ sqrt( historical_w ) ) ;
          
          historical_B = historical_B .+ ( delta_hidden * tmp_X ).^2 ;
          B_weights_update = mom .* B_weights_update .+ ( learning_rate .* ( delta_hidden * tmp_X ) ) ./ ( fudge_factor .+ sqrt( historical_B ) ) ;
          
          % update the weight matrices with weight_updates
          A_weights = A_weights + A_weights_update ;
          B_weights = B_weights + B_weights_update ;
          w_weights = w_weights + w_weights_update ;
          
          end % end of training_example loop
      
      % feedforward with this epoch's updated weights
      epoch_trained_tmp_X_through_sigmoid = ( 1.0 ./ ( 1.0 .+ exp( -( ( B_weights ) * dAuto_Encode_training_data' ) ) ) ) ;
      epoch_trained_output = ( dAuto_Encode_training_data * ( A_weights )' ) .+ ( epoch_trained_tmp_X_through_sigmoid' * ( w_weights ) ) ;
     
      % get sum squared error cost
      cost( iter , 1 ) = sum( sum( ( dAuto_Encode_targets .- epoch_trained_output ) .^ 2 ) ) ;
      
        % record best so far
        if cost( iter , 1 ) <= lowest_cost
           lowest_cost = cost( iter , 1 ) ;
           iter_min = iter ;
           best_A = A_weights ;
           best_B = B_weights ;
           best_w = w_weights ;
        end
      
      end % end of backpropagation loop
    
    lowest_cost                                        % print final cost to terminal
    iter_min ;                                           % and the iter it occured on
    graphics_toolkit( "qt" ) ;
    figure(1) ; plot( cost , 'r' , 'linewidth' , 3 ) ; % and plot the cost curve
    
    % plot weights
    graphics_toolkit( "gnuplot" ) ;
    figure(2) ; surf( best_A ) ; title( 'Best A Weights' ) ;
    figure(3) ; surf( best_B ) ; title( 'Best B Weights' ) ;
    figure(4) ; surf( best_w ) ; title( 'Best w Weights' ) ;
    
    % END OF CRBM WEIGHT PRE-TRAINING %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    % extract bias weights from best_A and best_B
    A_bias = best_A( : , bias_idx ) ; best_A( : , bias_idx ) = [] ; A_bias = sum( A_bias , 2 ) ; 
    B_bias = best_B( : , bias_idx ) ; best_B( : , bias_idx ) = [] ; B_bias = sum( B_bias , 2 ) ;
    
    % now delete bias units from batchdata
    batchdata( : , 1 ) = [] ;
    
    % create reshaped structures to hold A_weights and B_weights
    A1 = reshape( best_A , size( best_A , 1 ) , size( best_A , 2 ) / n1 , n1 ) ;
    B1 = reshape( best_B , size( best_B , 1 ) , size( best_B , 2 ) / n1 , n1 ) ;
    The following video shows the evolution of the weights whilst training over 20 consecutive price bars. The top three panes are the weights after the denoising autoencoder training and the bottom three are the same weights after being used as initialisation weights for the CRBM training and then being modified by this CRBM training. It is this final set of weights that would typically be used for CRBM generation.
    non-embedded view