## Monday, 20 November 2017

### Candlestick Plotting Function for Octave.

I have long been frustrated by the lack of an "out of the box" solution for plotting OHLC candlestick charts natively in Octave, the closest solution I know being the highlow plot function from the financial package ( which does not yet implement a candle function ) over at Octave Sourceforge. This being the case, I decided to write my own candlestick plotting functions, the codes for which are shown below.

This first, basic version just plots a candlestick chart with blue for up bars ( close higher than open ) or red for down bars ( close less than open ).
## Copyright (C) 2017 dekalog
##
## This program is free software; you can redistribute it and/or modify it
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .
## -*- texinfo -*-
## @deftypefn {} {@var{retval} =} candles (@var{O}, @var{H}, @var{L}, @var{C})
##
## Takes O, H, L and C inputs and plots a candlestick chart, with blue bars for
## close up days and red bars for close down days.
##
## @seealso{}
## @end deftypefn

## Author: dekalog
## Created: 2017-11-16

function [retval] = candles ( open , high , low , close )

if ( nargin != 4 )
error ( "Not enough input arguments. Should be OHLC vectors." ) ;
endif

wicks = high .- low ;
body = close .- open ;
up_down = sign( body ) ;

hold on ;
% plot the wicks
x = ( 1 : length( close ) ) ; % the x-axis
idx = x ;
high_nan = nan( size( high ) ) ; high_nan( idx ) = high ; % highs
low_nan = nan( size( low ) ) ; low_nan( idx ) = low ;     % lows
x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
plot( x , y , "k" , 'linewidth' , 2 ) ;                   % plot black wicks

% plot the up bars
x = ( 1 : length( high ) ) ;                                      % the x-axis
idx = ( up_down == 1 ) ; idx = find( idx ) ;                      % index by condition
high_nan = nan( size( high ) ) ; high_nan( idx ) = close( idx ) ; % index closes > opens
low_nan = nan( size( low ) ) ; low_nan( idx ) = open( idx ) ;     % index opens < closes
x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
plot( x , y , "b" , 'linewidth' , 20 ) ;                           % plot blue up bars

% plot the down bars
x = ( 1 : length( high ) ) ;                                      % the x-axis
idx = ( up_down == -1 ) ; idx = find( idx ) ;                     % index by condition
high_nan = nan( size( high ) ) ; high_nan( idx ) = open( idx ) ;  % index opens > closes
low_nan = nan( size( low ) ) ; low_nan( idx ) = close( idx ) ;    % index closes < opens
x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
plot( x , y , "r" , 'linewidth' , 20 ) ;                          % plot red down bars

hold off ;

endfunction

The second version is a conditional plotting version which takes a condition vector as input along with the OHLC vectors and plots candlesticks with different colours according to the condition in the condition vector ( conditions are integers 1 to 3 inclusive ).
## Copyright (C) 2017 dekalog
##
## This program is free software; you can redistribute it and/or modify it
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .
## -*- texinfo -*-
## @deftypefn {} {@var{retval} =} candles (@var{O}, @var{H}, @var{L}, @var{C}, @var{TC})
##
## Takes OHLC inputs and a TC vector and plots a candlestick chart, with up bars coloured
## cyan or blue and down bars magenta or red, dependent on value contained in the
## condition vector TC ( values == 1 or == 2 ).
##
## @seealso{}
## @end deftypefn

## Author: dekalog
## Created: 2017-11-16

function [retval] = candles_tc ( open , high , low , close , tc )

if ( nargin != 5 )
error ( "Not enough input arguments. Should be OHLC and a condition vector" ) ;
endif

if ( length( unique( tc ) ) > 3 )
error ( "Too many conditions in condition vector. Maximum of 3 conditions allowed." ) ;
endif

wicks = high .- low ;
body = close .- open ;
up_down = sign( body ) ;
up_colours = "cbg" ;
down_colours = "mrk" ;
candle_body_width = 20 ;

hold on ;

% plot the wicks
x = ( 1 : length( close ) ) ; % the x-axis
idx = x ;
high_nan = nan( size( high ) ) ; high_nan( idx ) = high ; % highs
low_nan = nan( size( low ) ) ; low_nan( idx ) = low ;     % lows
x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
plot( x , y , "k" , 'linewidth' , 2 ) ;                   % plot black wicks

for ii = 1 : length( unique( tc ) )

% plot the up bars for ii condition
x = ( 1 : length( high ) ) ;                                      % the x-axis
idx = ( up_down == 1 ) .* ( tc == ii ) ; idx = find( idx ) ;      % index by condition
high_nan = nan( size( high ) ) ; high_nan( idx ) = close( idx ) ; % index closes > opens
low_nan = nan( size( low ) ) ; low_nan( idx ) = open( idx ) ;     % index opens < closes
x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
plot( x , y , up_colours( ii ) , 'linewidth' , candle_body_width ) ;

% plot the down bars for ii condition
x = ( 1 : length( high ) ) ;                                      % the x-axis
idx = ( up_down == -1 ) .* ( tc == ii ) ; idx = find( idx ) ;     % index by condition
high_nan = nan( size( high ) ) ; high_nan( idx ) = open( idx ) ;  % index opens > closes
low_nan = nan( size( low ) ) ; low_nan( idx ) = close( idx ) ;    % index closes < opens
x = reshape( [ x ; x ; nan( size( x ) ) ] , [] , 1 ) ;
y = reshape( [ high_nan(:)' ; low_nan(:)' ; nan( 1 , length( high ) ) ] , [] , 1 ) ;
plot( x , y , down_colours( ii ) , 'linewidth' , candle_body_width ) ;

endfor

hold off ;

endfunction

An example of a plot by this second version is
There are two conditions being plotted on this 1 hour chart: cyan up bars and magenta down bars are bars that occur in the "Asian session," i.e. after 17:00 New York Time local time and before 07:00 London local time; and blue up bars and red down bars are bars that occur in the overlapping London - New York session, i.e. between 07:00 London local time and 17:00 New York local time.

The horizontal black lines are not part of the basic plot function but are added later by use of the "hold" function. These lines are the "Tokyo Channel," i.e. the high and low of the Asian session extended into the immediately following London - New York session.

I hope readers who use Octave will find these plotting functions useful.

## Tuesday, 31 October 2017

### Prepending Historical Data with Oanda's R API

As a follow on to my previous post, which was about appending data, the script below prepends historical data to an assumed existing data record.
% cd to the hourly data directory
setwd("~/Documents/octave/oanda_data/hourly")

for( ii in 1 : nrow( all_current_historical_data_list ) ) {

instrument = all_current_historical_data_list[ ii , 1 ]

current_ohlc_record = read.table( file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , header = FALSE , na = "" , sep = "," ,
stringsAsFactors = FALSE )

current_ohlc_record_begin_date_time = as.character( current_ohlc_record[ 1 , 1 ] ) % get the date/time value to be matched
last_date_ix = as.Date( current_ohlc_record[ 1 , 1 ] )                             % the end date for new data to be downloaded

% last 40 weeks of hourly data approx = 5000 hourly bars
begin_date_ix = as.Date( last_date_ix - 280 )                      % the begin date for new data to be downloaded

new_historical_data = HisPricesDates( Granularity = "H1", DayAlign, TimeAlign, AccountToken, instrument,
begin_date_ix , last_date_ix + 2 ) % +2 to ensure that the end of the new downloaded data will
% overlap with the beginning of current_ohlc_record

% having ensured no data is missed by overlaping with the current_ohlc_record, delete duplicated OHLC information
new_historical_data_date_times = as.character( new_historical_data[ , 1 ] ) % vector to search for the above date value

ix = charmatch( current_ohlc_record_begin_date_time , new_historical_data_date_times ) % get the matching index value

% delete that part of new_historical_data which is already contained in filename
new_historical_data = new_historical_data[ -( ix : nrow( new_historical_data ) ) , ]

% before prepending new_historical_data in front of current_ohlc_record, need to give names to current_ohlc_record as
% rbind needs to bind by named attributes
names( current_ohlc_record ) = names( new_historical_data )

% see https://stackoverflow.com/questions/11785710/rbind-function-changing-my-entries for reason for following
% also need to coerce that dates in new_historical_data from POSIXct to character
new_historical_data$TimeStamp = as.character( new_historical_data$TimeStamp )

% and now prepend new_historical_data to current_ohlc_record
combined_records = rbind( new_historical_data , current_ohlc_record , stringsAsFactors = FALSE )

% and coerce character dates back to a POSIXct date format prior to printing
combined_records$TimeStamp = as.POSIXct( combined_records$TimeStamp )

% write combined_records to file
write.table( combined_records , file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , row.names = FALSE ,
col.names = FALSE , sep = "," )

% and amend Instrument_update file with lastest update information
all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length

% write updated Instrument_update_file to file
write.table( all_current_historical_data_list , file = "instrument_hourly_update_file" , row.names = FALSE , col.names = FALSE )

} % end of for all_current_historical_data_list loop
As described in the previous post the function HisPricesDates is called to do the actual downloading, with the relevant dates for function input being read and calculated from the existing data file ( I have hard coded for hourly data but this can, of course, be changed or implemented as user input in the R session). As usual I have commented the script to explain what is going on.

However, one important caveat is that it is assumed that there is actually some Oanda data to download and prepend prior the the earliest date in the existing  record, and there are no checks of this assumption. Therefore, the script might fail in unexpected ways if one attempts to reach too far back in history for the prependable data.

## Friday, 27 October 2017

### Updating Historical Data Using Oanda's API and R

The main function to do this, HisPricesDates, downloads data between given dates as function inputs and is shown below.
HisPricesDates  = function( Granularity, DayAlign, TimeAlign, AccountToken, Instrument, Start, End ){

% a typical Oanda API call might look like
% which is slowly built up by using the R paste function, commented at end of each line below

auth           = c(Authorization = paste("Bearer",AccountToken,sep=" "))
qstart = paste("start=",Start,sep="")                                % start=2014-03-21
qend   = paste("end=",End,sep="")                                    % end=2014-04-21
qcandleFormat  = "candleFormat=midpoint"                             % candleFormat=midpoint
qgranularity   = paste("granularity=",Granularity,sep="")            % granularity=D
qdailyalignment    = paste("dailyAlignment=",DayAlign,sep="")        % dailyAlignment=0
qincludeFirst = "includeFirst=false"                                 % includeFirst=false
QueryHistPrec2 = paste(QueryHistPrec1,qgranularity,qstart,qend,qcandleFormat,qincludeFirst,qdailyalignment,sep="&")
InstHistPjson = fromJSON(InstHistP, simplifyDataFrame = TRUE)
Prices        = data.frame(InstHistPjson[[3]])
Prices$time = paste(substr(Prices$time,1,10),substr(Prices$time,12,19), sep=" ") colnames(Prices) = c("TimeStamp","Open","High","Low","Close","TickVolume","Complete") Prices$TimeStamp = as.POSIXct(strptime(Prices$TimeStamp, "%Y-%m-%d %H:%M:%OS"),origin="1970-01-01",tz = "UTC") attributes(Prices$TimeStamp)tzone = TimeAlign return(Prices) } This function is called by two R scripts, one for downloading daily data and one for intraday data. The daily update script, which is shown next, % cd to the daily data directory setwd("~/Documents/octave/oanda_data/daily") all_current_historical_data_list = read.table("instrument_daily_update_file",header=FALSE,sep="",colClasses=c("character","Date","numeric") ) for( ii in 1 : nrow( all_current_historical_data_list ) ) { instrument = all_current_historical_data_list[ ii , 1 ] % read second column of dates in all_current_historical_data_list as a date index date_ix = as.Date( all_current_historical_data_list[ ii , 2 ] ) todays_date = as.Date( Sys.time() ) % download the missing historical data from date_ix to todays_date, if and only if, date_ix != todays_date if( date_ix + 1 != todays_date ) { new_historical_data = HisPricesDates( Granularity = "D", DayAlign, TimeAlign, AccountToken, instrument, date_ix , todays_date ) % the new_historical_data might only try to add incomplete OHLC data, in which case do not actually % want to update, so only update if we will be adding new, complete OHLC information if ( nrow( new_historical_data ) >= 2 & new_historical_data[ 2 , 7 ] == TRUE ) { % now do some data manipulation % expect date of last line in Instrument_update_file == date of first line in new_historical_data if ( date_ix == as.Date( new_historical_data[ 1 , 1 ] ) ) { % this is the case if true new_historical_data = new_historical_data[ -1 , ] % so delete first row of new_historical_data } % similarly, expect last line of new_historical_data to be an incomplete OHLC bar if ( new_historical_data[ nrow( new_historical_data) , 7 ] == FALSE) { % if so, new_historical_data = new_historical_data[ -nrow( new_historical_data) , ] % delete this last line } % append new_historical_data to the relevant raw data file write.table( new_historical_data , file = paste( instrument , "raw_OHLC_daily" , sep = "_" ) , row.names = FALSE , na = "" , col.names = FALSE , sep = "," , append = TRUE ) added_data_length = nrow( new_historical_data ) new_last_date = as.Date( new_historical_data[ added_data_length , 1 ] ) % and amend Instrument_update file with lastest update information all_current_historical_data_list[ ii , 2 ] = new_last_date all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length } % end of download if statement } % end of ( date_ix != todays_date ) if statement } % end of for all_current_historical_data_list loop % Write updated Instrument_update_file to file write.table( all_current_historical_data_list , file = "instrument_daily_update_file" , row.names = FALSE , col.names = FALSE , na = "" ) has if statements as control structures to check that there is likely to be new daily data to actually download. It does this by checking a last_update date contained in an "instrument_daily_update_file" and comparing this with the current OS system time. If there is likely to be new data, the script runs and then updates this "instrument_daily_update_file." If not, the script exits with nothing having been done. The intraday update script doe not have the checks the daily script has because I assume there will always be some new intraday data available for download. In this case, the last_update date is read from the "instrument_update_file" purely to act as an input to the above HisPricesDates function. As a result, this script involves some data manipulation to ensure that duplicate data is not printed to file. This script is shown next and is heavily commented to explain what is happening. % cd to the hourly data directory setwd("~/Documents/octave/oanda_data") all_current_historical_data_list = read.table("instrument_hourly_update_file",header=FALSE,sep="",colClasses=c("character","Date","numeric") ) for( ii in 1 : nrow( all_current_historical_data_list ) ) { instrument = all_current_historical_data_list[ ii , 1 ] % read second column of dates in all_current_historical_data_list as a date index date_ix = as.Date( all_current_historical_data_list[ ii , 2 ] ) todays_date = as.Date( Sys.time() ) % download the missing historical data from date_ix to todays_date. If date_ix == todays_date, will download all % hourly bars for today only. new_historical_data = HisPricesDates( Granularity = "H1", DayAlign, TimeAlign, AccountToken, instrument, date_ix , todays_date + 1 ) % the new_historical_data will almost certainly have incomplete hourly OHLC data in its last line, % so delete this incomplete OHLC information if ( new_historical_data[ nrow( new_historical_data ) , 7 ] == FALSE ) { new_historical_data = new_historical_data[ -nrow( new_historical_data ) , ] } % read the last line only of the current OHLC file for this instrument file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) % get the filename system_command = paste( "tail -1" , file , sep = " " ) % create a unix system command to read the last line of this file % read the file's last line old_historical_data = read.csv( textConnection( system( system_command , intern = TRUE ) ) , header = FALSE , sep = "," , stringsAsFactors = FALSE ) old_historical_data_end_date_time = old_historical_data[ 1 , 1 ] % get the date value to be matched new_historical_data_date_times = as.character( new_historical_data[ , 1 ] ) % vector to search for the above date value ix = charmatch( old_historical_data_end_date_time , new_historical_data_date_times ) % get the matching index value % delete that part of new_historical_data which is already contained in filename new_historical_data = new_historical_data[ -( 1 : ix ) , ] % append new_historical_data to the relevant raw data file write.table( new_historical_data , file = paste( instrument , "raw_OHLC_hourly" , sep = "_" ) , row.names = FALSE , na = "" , col.names = FALSE , sep = "," , append = TRUE ) added_data_length = nrow( new_historical_data ) % length of added new data new_last_date = as.Date( new_historical_data[ added_data_length , 1 ] ) % date of last update % and amend Instrument_update file with lastest update information all_current_historical_data_list[ ii , 2 ] = new_last_date all_current_historical_data_list[ ii , 3 ] = all_current_historical_data_list[ ii , 3 ] + added_data_length } % end of for all_current_historical_data_list loop % finally, write updated Instrument_update_file to file write.table( all_current_historical_data_list , file = "instrument_hourly_update_file" , row.names = FALSE , col.names = FALSE , na = "" ) There is one important thing to point out on lines 29 to 33, which is that this section of code relies on a Unix based command, which in turn means that this almost certainly will not work on Windows based OSes. Windows users will have to find their own hack to load just the last line of the relevant file, or put up with loading the whole historical data file and indexing just the last line. ## Thursday, 21 September 2017 ### Downloading Historical Data Using Oanda's API and R It has been about 5 months since my last blog post and in this time I have been working away from home, been on summer holiday and spent some time mucking about on boats, so I have not been able to devote as much time to my blog as I would have liked. However, that has now changed, and this blog post is about obtaining historical data. Many moons ago I used to download free, EOD data from tradingblox, but they stopped updating this in early 2013. I then started concentrating more on forex data because free sources of this data are more readily available. However, this still meant ( for me at least ) a laborious process of manually downloading .txt/.csv files, with the attendant problem of the data not being fully up to date and then resulting in me doing historical testing on data that I would not be able to trade in real time. With my present focus on machine learning derived trading algorithms this was becoming an untenable position. My solution to this has been to personalize the ROandaAPI code that is freely available from this github, courtesy of IF.FranciscoME. I have stripped out some if statements, hard coded some variables particular to my own Oanda account, added some extra comments for my own enlightenment and broken the API code up into separate .R functions and .R scripts, which I use via RStudio. The first such R script is called to initialise the variables and load the various functions into the R environment, shown below. # Required Packages in order to use the R API functions library("downloader") library("RCurl") library("jsonlite") library("httr") # -- ---------------------------------------------------------------------------------- # # -- Specific Values of Parameters in API for my primary account ---------------------- # # -- ---------------------------------------------------------------------------------- # # -- numeric ------ # My Account ID AccountID = 123456 # -- character ---- # My Oanda Token AccountToken = "blah-de-blah-de-blah" # load the various function files source("~/path/to/InstrumentsList.R") source("~/path/to/R_Oanda_API_functions/ActualPrice.R") source("~/path/to/HisPricesCount.R") source("~/path/to/HisPricesDates.R") # get list of all tradeable instruments trade_instruments = InstrumentsList( AccountToken , AccountID ) View( trade_instruments ) # load some default values # -- character ---- # Granularity of the prices # granularity: The time range represented by each candlestick. The value specified will determine the alignment # of the first candlestick. # choose from S% S10 S15 S30 M1 M2 M3 M4 M5 M10 M15 M30 H1 H2 H3 H4 H6 H8 H12 D W M ( secs mins hours day week month ) Granularity = "D" # -- numeric ------ # Hour of the "End of the Day" # dailyAlignment: The hour of day used to align candles with hourly, daily, weekly, or monthly granularity. # The value specified is interpretted as an hour in the timezone set through the alignmentTimezone parameter and must be # an integer between 0 and 23. DayAlign = 0 # original R code and Oanda default is 17 at "America%2FNew_York" # -- character ---- # Time Zone in format "Continent/Zone # alignmentTimezone: The timezone to be used for the dailyAlignment parameter. This parameter does NOT affect the # returned timestamp, the start or end parameters, these will always be in UTC. The timezone format used is defined by # the IANA Time Zone Database, a full list of the timezones supported by the REST API can be found at # http://developer.oanda.com/docs/timezones.txt # "America%2FMexico_City" was the originallly provided, but could use, for example, "Europe%2FLondon" or "Europe%2FWarsaw" TimeAlign = "Europe%2FLondon" ################################# IMPORTANT NOTE ##################################################################### # By setting DayAlign = 0 and TimeAlign = "Europe%2FLondon" the end of bar is midnight in London. Doing this ensures # that the bar OHLC in data downloads matches the bars seen in the Oanda FXTrade software, which for my account is # Europe Division, e.g. London time. The timestamps on downloads are, however, at GMT times, which means during summer # daylight saving time the times shown on the Oanda software seem to be one hour ahead of GMT. ###################################################################################################################### Start = Sys.time() # Current system time End = Sys.time() # Current system time Count = 500 # Oanda default # now cd to the working directory setwd("~/path/to/oanda_data") The code is liberally commented to describe reasons for my default choices. The InstrumentsList.R function called in the above script is shown next. InstrumentsList = function( AccountToken , AccountID ) { httpaccount = "https://api-fxtrade.oanda.com" auth = c(Authorization = paste("Bearer",AccountToken,sep=" ")) Queryhttp = paste(httpaccount,"/v1/instruments?accountId=",sep="") QueryInst = paste(Queryhttp,AccountID,sep="") QueryInst1 = getURL(QueryInst,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth) InstJson = fromJSON(QueryInst1, simplifyDataFrame = TRUE) FinalData = data.frame(InstJson) colnames(FinalData) = c("Instrument","DisplayName","PipSize","MaxTradeUnits") FinalDataMaxTradeUnits = as.numeric(FinalDataMaxTradeUnits) return(FinalData) } This downloads a list of all the available trading instruments for the associated Oanda account. The following R script actually downloads the historical data for all the trading instruments listed in the above mentioned list and writes the data to separate files; one file per instrument. It also keeps track of the all instruments and the date of the last complete OHLC bar in the historical record and writes this to file also. # cd to the working directory setwd("~/path/to/oanda_data") # dataframe to keep track of updates Instrument_update_file = data.frame( Instrument = character() , Date = as.Date( character() ) , stringsAsFactors = FALSE ) for( ii in 1 : nrow( trade_instruments ) ) { instrument = trade_instruments[ ii , 1 ] # write details of instrument to Instrument_update_file Instrument_update_file[ ii , 1 ] = instrument historical_prices = HisPricesCount( Granularity = "D", DayAlign , TimeAlign , AccountToken ,instrument , Count = 5000 ) last_row_ix = nrow( historical_prices ) if ( historical_prices[ last_row_ix , 7 ] == FALSE ){ # last obtained OHLC bar values are incomplete # and do not want to save incomplete OHLC values, so add date of previous line of complete OHLC data # to Instrument_update_file Instrument_update_file[ ii , 2 ] = as.Date( historical_prices[ last_row_ix - 1 , 1 ] ) # and delete the row with these incomplete values historical_prices = historical_prices[ 1 : last_row_ix - 1 , ] } # end of if statement # Write historical_prices to file write.table( historical_prices , file = paste( instrument , "raw_OHLC_daily" , sep = "_" ) , row.names = FALSE , na = "" , col.names = FALSE , sep = "," ) } # end of for loop # Write Instrument_update_file to file write.table( Instrument_update_file , file = "Instrument_update_file" , row.names = FALSE , na = "" , col.names = TRUE , sep = "," ) This script repeatedly calls the actual download function, HisPricesCount.R, which does all the heavy lifting in a loop, and the code for this download function is HisPricesCount = function( Granularity, DayAlign, TimeAlign, AccountToken, Instrument, Count ){ httpaccount = "https://api-fxtrade.oanda.com" auth = c(Authorization = paste("Bearer",AccountToken,sep=" ")) QueryHistPrec = paste(httpaccount,"/v1/candles?instrument=",sep="") QueryHistPrec1 = paste(QueryHistPrec,Instrument,sep="") qcount = paste("count=",Count,sep="") qcandleFormat = "candleFormat=midpoint" qgranularity = paste("granularity=",Granularity,sep="") qdailyalignment = paste("dailyAlignment=",DayAlign,sep="") QueryHistPrec2 = paste(QueryHistPrec1,qcandleFormat,qgranularity,qdailyalignment,qcount,sep="&") InstHistP = getURL(QueryHistPrec2,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"),httpheader=auth) InstHistPjson = fromJSON(InstHistP, simplifyDataFrame = TRUE) Prices = data.frame(InstHistPjson[[3]]) Pricestime      = paste(substr(Prices$time,1,10),substr(Prices$time,12,19), sep=" ")
colnames(Prices) = c("TimeStamp","Open","High","Low","Close","TickVolume","Complete")
Prices$TimeStamp = as.POSIXct(strptime(Prices$TimeStamp, "%Y-%m-%d %H:%M:%OS"),origin="1970-01-01",tz = "UTC")
attributes(Prices$TimeStamp)$tzone = TimeAlign
return(Prices)

}
One of the input variables for this function is Count ( default = 5000 ), which means that the function downloads the last 5000 OHLC bar records up to and including the most recent, which may still be forming and hence is incomplete. The calling script ensures that any incomplete bar is stripped from the record so that only complete bars are printed to file.

All in all this is a vast improvement over my previous data collection regime, and kudos to IF.FranciscoME for making the base code available on his github.

## Thursday, 20 April 2017

### Using the BayesOpt Library to Optimise my Planned Neural Net

Following on from my last post, I have recently been using the BayesOpt library to optimise my planned neural net, and this post is a brief outline, with code, of what I have been doing.

My intent was to design a Nonlinear autoregressive exogenous model using my currency strength indicator as the main exogenous input, along with other features derived from the use of Savitzky-Golay filter convolution to model velocity, acceleration etc. I decided that rather than model prices directly, I would model the 20 period simple moving average because it would seem reasonable to assume that modelling a smooth function would be easier, and from this average it is a trivial matter to reverse engineer to get to the underlying price.

Given that my projected feature space/lookback length/number of nodes combination is/was a triple digit, discrete dimensional problem, I used the "bayesoptdisc" function from the BayesOpt library to perform a discrete Bayesian optimisation over these parameters, the main Octave script for this being shown below.
clear all ;

% all_rel_strengths_non_smooth = [ usd_rel_strength_non_smooth eur_rel_strength_non_smooth gbp_rel_strength_non_smooth chf_rel_strength_non_smooth ...
% jpy_rel_strength_non_smooth aud_rel_strength_non_smooth cad_rel_strength_non_smooth ] ;

% extract relevant data
% price = ( eurusd_daily_bars( : , 3 ) .+ eurusd_daily_bars( : , 4 ) ) ./ 2 ; % midprice
% price = ( gbpusd_daily_bars( : , 3 ) .+ gbpusd_daily_bars( : , 4 ) ) ./ 2 ; % midprice
% price = ( usdchf_daily_bars( : , 3 ) .+ usdchf_daily_bars( : , 4 ) ) ./ 2 ; % midprice
price = ( usdjpy_daily_bars( : , 3 ) .+ usdjpy_daily_bars( : , 4 ) ) ./ 2 ; % midprice
base_strength = all_rel_strengths_non_smooth( : , 1 ) .- 0.5 ;
term_strength = all_rel_strengths_non_smooth( : , 5 ) .- 0.5 ;

% clear unwanted data
% clear eurusd_daily_bars all_rel_strengths_non_smooth ;
% clear gbpusd_daily_bars all_rel_strengths_non_smooth ;
% clear usdchf_daily_bars all_rel_strengths_non_smooth ;
clear usdjpy_daily_bars all_rel_strengths_non_smooth ;

global start_opt_line_no = 200 ;
global stop_opt_line_no = 7545 ;

% get matrix coeffs
slope_coeffs = generalised_sgolay_filter_coeffs( 5 , 2 , 1 ) ;
accel_coeffs = generalised_sgolay_filter_coeffs( 5 , 2 , 2 ) ;
jerk_coeffs = generalised_sgolay_filter_coeffs( 5 , 3 , 3 ) ;

% create features
sma20 = sma( price , 20 ) ;
global targets = sma20 ;
[ sma_max , sma_min ] = adjustable_lookback_max_min( sma20 , 20 ) ;
global sma20r = zeros( size(sma20,1) , 5 ) ;
global sma20slope = zeros( size(sma20,1) , 5 ) ;
global sma20accel = zeros( size(sma20,1) , 5 ) ;
global sma20jerk = zeros( size(sma20,1) , 5 ) ;

global sma20diffs = zeros( size(sma20,1) , 5 ) ;
global sma20diffslope = zeros( size(sma20,1) , 5 ) ;
global sma20diffaccel = zeros( size(sma20,1) , 5 ) ;
global sma20diffjerk = zeros( size(sma20,1) , 5 ) ;

global base_strength_f = zeros( size(sma20,1) , 5 ) ;
global term_strength_f = zeros( size(sma20,1) , 5 ) ;

base_term_osc = base_strength .- term_strength ;
global base_term_osc_f = zeros( size(sma20,1) , 5 ) ;
slope_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_bt_osc_f = zeros( size(sma20,1) , 5 ) ;
accel_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_bt_osc_f = zeros( size(sma20,1) , 5 ) ;
jerk_bt_osc = rolling_endpoint_gen_poly_output( base_term_osc , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_bt_osc_f = zeros( size(sma20,1) , 5 ) ;

slope_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_base_strength_f = zeros( size(sma20,1) , 5 ) ;
accel_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_base_strength_f = zeros( size(sma20,1) , 5 ) ;
jerk_base_strength = rolling_endpoint_gen_poly_output( base_strength , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_base_strength_f = zeros( size(sma20,1) , 5 ) ;

slope_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 2 , 1 ) ; % no_of_points(p),filter_order(n),derivative(s)
global slope_term_strength_f = zeros( size(sma20,1) , 5 ) ;
accel_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 2 , 2 ) ; % no_of_points(p),filter_order(n),derivative(s)
global accel_term_strength_f = zeros( size(sma20,1) , 5 ) ;
jerk_term_strength = rolling_endpoint_gen_poly_output( term_strength , 5 , 3 , 3 ) ; % no_of_points(p),filter_order(n),derivative(s)
global jerk_term_strength_f = zeros( size(sma20,1) , 5 ) ;

min_max_range = sma_max .- sma_min ;

for ii = 51 : size( sma20 , 1 ) - 1 % one step ahead is target

targets(ii) = 2 * ( ( sma20(ii+1) - sma_min(ii) ) / min_max_range(ii) - 0.5 ) ;

% scaled sma20
sma20r(ii,:) = 2 .* ( ( flipud( sma20(ii-4:ii,1) )' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ;
sma20slope(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * slope_coeffs ) ;
sma20accel(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * accel_coeffs ) ;
sma20jerk(ii,:) = fliplr( ( 2 .* ( ( sma20(ii-4:ii,1)' .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) ) * jerk_coeffs ) ;

% scaled diffs of sma20
sma20diffs(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' ) ;
sma20diffslope(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * slope_coeffs ) ;
sma20diffaccel(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * accel_coeffs ) ;
sma20diffjerk(ii,:) = fliplr( diff( 2.* ( ( sma20(ii-5:ii,1) .- sma_min(ii) ) ./ min_max_range(ii) .- 0.5 ) )' * jerk_coeffs ) ;

% base strength
base_strength_f(ii,:) = fliplr( base_strength(ii-4:ii)' ) ;
slope_base_strength_f(ii,:) = fliplr( slope_base_strength(ii-4:ii)' ) ;
accel_base_strength_f(ii,:) = fliplr( accel_base_strength(ii-4:ii)' ) ;
jerk_base_strength_f(ii,:) = fliplr( jerk_base_strength(ii-4:ii)' ) ;

% term strength
term_strength_f(ii,:) = fliplr( term_strength(ii-4:ii)' ) ;
slope_term_strength_f(ii,:) = fliplr( slope_term_strength(ii-4:ii)' ) ;
accel_term_strength_f(ii,:) = fliplr( accel_term_strength(ii-4:ii)' ) ;
jerk_term_strength_f(ii,:) = fliplr( jerk_term_strength(ii-4:ii)' ) ;

% base term oscillator
base_term_osc_f(ii,:) = fliplr( base_term_osc(ii-4:ii)' ) ;
slope_bt_osc_f(ii,:) = fliplr( slope_bt_osc(ii-4:ii)' ) ;
accel_bt_osc_f(ii,:) = fliplr( accel_bt_osc(ii-4:ii)' ) ;
jerk_bt_osc_f(ii,:) = fliplr( jerk_bt_osc(ii-4:ii)' ) ;

endfor

% create xset for bayes routine
% raw indicator
xset = zeros( 4 , 5 ) ; xset( 1 , : ) = 1 : 5 ;
to_add = zeros( 4 , 15 ) ;
to_add( 1 , : ) = [ 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 ] ;
to_add( 2 , : ) = [ 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 ] ;
xset = [ xset to_add ] ;
to_add = zeros( 4 , 21 ) ;
to_add( 1 , : ) = [ 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 ] ;
to_add( 2 , : ) = [ 1 1 2 2 1 2 2 3 3 3 1 2 3 4 2 3 3 4 4 4 4 ] ;
to_add( 3 , : ) = [ 1 1 1 2 1 1 2 1 2 3 1 1 1 1 2 2 3 1 2 3 4 ] ;
xset = [ xset to_add ] ;
to_add = zeros( 4 , 70 ) ;
to_add( 1 , : ) = [ 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ] ;
to_add( 2 , : ) = [ 1 1 2 2 2 1 2 2 2 3 3 3 3 3 3 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 1 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ] ;
to_add( 3 , : ) = [ 1 1 1 2 2 1 1 2 2 1 2 2 3 3 3 1 1 2 2 1 2 2 3 3 3 1 2 2 3 3 3 4 4 4 4 1 1 2 2 1 2 2 3 3 3 1 2 2 3 3 3 4 4 4 4 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 ] ;
to_add( 4 , : ) = [ 1 1 1 1 2 1 1 1 2 1 1 2 1 2 3 1 1 1 2 1 1 2 1 2 3 1 1 2 1 2 3 1 2 3 4 1 1 1 2 1 1 2 1 2 3 1 1 2 1 2 3 1 2 3 4 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 ] ;
xset = [ xset to_add ] ;
% construct all_xset for combinations of indicators and look back lengths
all_zeros = zeros( size( xset ) ) ;
all_xset = [ xset ; repmat( all_zeros , 3 , 1 ) ] ;
all_xset = [ all_xset [ xset ; xset ; all_zeros ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; xset ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; all_zeros ; xset ] ] ;
all_xset = [ all_xset [ xset ; xset ; xset ; all_zeros ] ] ;
all_xset = [ all_xset [ xset ; xset ; all_zeros ; xset ] ] ;
all_xset = [ all_xset [ xset ; all_zeros ; xset ; xset ] ] ;
all_xset = [ all_xset repmat( xset , 4 , 1 ) ] ;

ones_all_xset = ones( 1 , size( all_xset , 2 ) ) ;

% now add layer for number of neurons and extend as necessary
max_number_of_neurons_in_layer = 20 ;

parameter_matrix = [] ;

for ii = 2 : max_number_of_neurons_in_layer % min no. of neurons is 2, max = max_number_of_neurons_in_layer
parameter_matrix = [ parameter_matrix [ ii .* ones_all_xset ; all_xset ] ] ;
endfor

% now the actual bayes optimisation routine
% set the parameters
params.n_iterations = 190; % bayesopt library default is 190
params.n_init_samples = 10;
params.crit_name = 'cEIa'; % cEI is default. cEIa is an annealed version
params.surr_name = 'sStudentTProcessNIG';
params.noise = 1e-6;
params.kernel_name = 'kMaternARD5';
params.kernel_hp_mean = [1];
params.kernel_hp_std = [10];
params.verbose_level = 1; % 3 to path below
params.log_filename = '/home/dekalog/Documents/octave/cplusplus.oct_functions/nn_functions/optimise_narx_ind_lookback_nodes_log';
params.l_type = 'L_MCMC' ; % L_EMPIRICAL is default
params.epsilon = 0.5 ; % probability of performing a random (blind) evaluation of the target function.
% Higher values implies forced exploration while lower values relies more on the exploration/exploitation policy of the criterion. 0 is default

% the function to optimise
fun = 'optimise_narx_ind_lookback_nodes_rolling' ;

% the call to the Bayesopt library function
bayesoptdisc( fun , parameter_matrix , params ) ;
% result is the minimum as a vector (x_out) and the value of the function at the minimum (y_out)
What this script basically does is:
1. load all the relevant data ( in this case a forex pair )
2. creates a set of scaled features
3. creates a necessary parameter matrix for the discrete optimisation function
4. sets the parameters for the optimisation routine
5. and finally calls the "bayesoptdisc" function
Note that in step 2 all the features are declared as global variables, this being necessary because the "bayesoptdisc" function of the BayesOpt library does not appear to admit passing these variables as inputs to the function.

The actual function to be optimised is given in the following code box, and is basically a looped neural net training routine.
## Copyright (C) 2017 dekalog
##
## This program is free software; you can redistribute it and/or modify it
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program.  If not, see .

## -*- texinfo -*-
## @deftypefn {} {@var{retval} =} optimise_narx_ind_lookback_nodes_rolling (@var{input1})
##
## @seealso{}
## @end deftypefn

## Author: dekalog
## Created: 2017-03-21

function [ retval ] = optimise_narx_ind_lookback_nodes_rolling( input1 )

% declare all the global variables so the function can "see" them
global start_opt_line_no ;
global stop_opt_line_no ;
global targets ;
global sma20r ; % 2
global sma20slope ;
global sma20accel ;
global sma20jerk ;
global sma20diffs ;
global sma20diffslope ;
global sma20diffaccel ;
global sma20diffjerk ;
global base_strength_f ;
global slope_base_strength_f ;
global accel_base_strength_f ;
global jerk_base_strength_f ;
global term_strength_f ;
global slope_term_strength_f ;
global accel_term_strength_f ;
global jerk_term_strength_f ;
global base_term_osc_f ;
global slope_bt_osc_f ;
global accel_bt_osc_f ;
global jerk_bt_osc_f ;

% build feature matrix from the above global variable according to parameters in input1
hidden_layer_size = input1(1) ;

% training targets
Y = targets( start_opt_line_no:stop_opt_line_no , 1 ) ;

% create empty feature matrix
X = [] ;

% which will always have at least one element of the main price series for the NARX
X = [ X sma20r( start_opt_line_no:stop_opt_line_no , 1:input1(2) ) ] ;

% go through input1 values in turn and add to X if necessary
if input1(3) > 0
X = [ X sma20slope( start_opt_line_no:stop_opt_line_no , 1:input1(3) ) ] ;
endif

if input1(4) > 0
X = [ X sma20accel( start_opt_line_no:stop_opt_line_no , 1:input1(4) ) ] ;
endif

if input1(5) > 0
X = [ X sma20jerk( start_opt_line_no:stop_opt_line_no , 1:input1(5) ) ] ;
endif

if input1(6) > 0
X = [ X sma20diffs( start_opt_line_no:stop_opt_line_no , 1:input1(6) ) ] ;
endif

if input1(7) > 0
X = [ X sma20diffslope( start_opt_line_no:stop_opt_line_no , 1:input1(7) ) ] ;
endif

if input1(8) > 0
X = [ X sma20diffaccel( start_opt_line_no:stop_opt_line_no , 1:input1(8) ) ] ;
endif

if input1(9) > 0
X = [ X sma20diffjerk( start_opt_line_no:stop_opt_line_no , 1:input1(9) ) ] ;
endif

if input1(10) > 0 % input for base and term strengths together
X = [ X base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(10) ) ] ;
X = [ X term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(10) ) ] ;
endif

if input1(11) > 0
X = [ X slope_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(11) ) ] ;
X = [ X slope_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(11) ) ] ;
endif

if input1(12) > 0
X = [ X accel_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(12) ) ] ;
X = [ X accel_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(12) ) ] ;
endif

if input1(13) > 0
X = [ X jerk_base_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(13) ) ] ;
X = [ X jerk_term_strength_f( start_opt_line_no:stop_opt_line_no , 1:input1(13) ) ] ;
endif

if input1(14) > 0
X = [ X base_term_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(14) ) ] ;
endif

if input1(15) > 0
X = [ X slope_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(15) ) ] ;
endif

if input1(16) > 0
X = [ X accel_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(16) ) ] ;
endif

if input1(17) > 0
X = [ X jerk_bt_osc_f( start_opt_line_no:stop_opt_line_no , 1:input1(17) ) ] ;
endif

% now the X features matrix has been formed, get its size
X_rows = size( X , 1 ) ; X_cols = size( X , 2 ) ;

X = [ ones( X_rows , 1 ) X ] ; % add bias unit to X

fan_in = X_cols + 1 ; % no. of inputs to a node/unit, including bias
fan_out = 1 ; % no. of outputs from node/unit
r = sqrt( 6 / ( fan_in + fan_out ) ) ;

rolling_window_length = 100 ;
n_iters = 100 ;
n_iter_errors = zeros( n_iters , 1 ) ;
all_errors = zeros( X_rows - ( rolling_window_length - 1 ) - 1 , 1 ) ;
rolling_window_loop_iter = 0 ;

for rolling_window_loop = rolling_window_length : X_rows - 1

rolling_window_loop_iter = rolling_window_loop_iter + 1 ;

% train n_iters no. of nets and put the error stats in n_iter_errors
for ii = 1 : n_iters

% initialise weights
% see https://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network

% One option is Orthogonal random matrix initialization for input_to_hidden weights
% w_i = rand( X_cols + 1 , hidden_layer_size ) ;
% [ u , s , v ] = svd( w_i ) ;
% input_to_hidden = [ ones( X_rows , 1 ) X ] * u ; % adding bias unit to X

% using fan_in and fan_out for tanh
w_i = ( rand( X_cols + 1 , hidden_layer_size ) .* ( 2 * r ) ) .- r ;
input_to_hidden = X( rolling_window_loop - ( rolling_window_length - 1 ) : rolling_window_loop , : ) * w_i ;

% push the input_to_hidden through the chosen sigmoid function
hidden_layer_output = sigmoid_lecun_m( input_to_hidden ) ;

% add bias unit for the output from hidden
hidden_layer_output = [ ones( rolling_window_length , 1 ) hidden_layer_output ] ;

% use hidden_layer_output as the input to a linear regression fit to targets Y
% a la Extreme Learning Machine
% w = ( inv( X' * X ) * X' ) * y ; the "classic" way for linear regression, where
% X = hidden_layer_output, but
w = ( ( hidden_layer_output' * hidden_layer_output ) \ hidden_layer_output' ) * Y( rolling_window_loop - ( rolling_window_length - 1 ) : rolling_window_loop , 1 ) ;
% is quicker and recommended

% use these current values of w_i and w for out of sample test
os_input_to_hidden = X( rolling_window_loop + 1 , : ) * w_i ;
os_hidden_layer_output = sigmoid_lecun_m( os_input_to_hidden ) ;
os_hidden_layer_output = [ 1 os_hidden_layer_output ] ; % add bias
os_output = os_hidden_layer_output * w ;
n_iter_errors( n_iters ) = abs( Y( rolling_window_loop + 1 , 1 ) - os_output ) ;

endfor

all_errors( rolling_window_loop_iter ) = mean( n_iter_errors ) ;

endfor % rolling_window_loop

retval = mean( all_errors ) ;

clear X w_i ;

endfunction

However, to speed things up for some rapid prototyping, rather than use backpropagation training this function uses the principles of an extreme learning machine and loops over 100 such trained ELMs per set of features contained in a rolling window of length 100 across the entire training data set. Walk forward cross validation is performed for each of the 100 ELMs, an average of the out of sample error obtained, and these averages across the whole data set are then averaged to provide the function return. The code was run on daily bars of the four major forex pairs; EURUSD, GBPUSD, USDCHF and USDYPY.

The results of running the above are quite interesting. The first surprise is that the currency strength indicator and features derived from it were not included in the optimal model for any of the four tested pairs. Secondly, for all pairs, a scaled version of a 20 bar price momentum function, and derived features, was included in the optimal model. Finally, again for all pairs, there was a symmetrically decreasing lookback period across the selected features, and when averaged across all pairs the following pattern results: 10 3 3 2 1 3 3 2 1, which is to be read as:
• 10 nodes (plus a bias node) in the hidden layer
• lookback length of 3 for the scaled values of the SMA20 and the 20 bar scaled momentum function
• lookback length of 3 for the slopes/rates of change of the above
• lookback length of 2 for the "accelerations" of the above
• lookback length of 1 for the "jerks" of the above
So it would seem that the 20 bar momentum function is a better exogenous input than the currency strength indicator. The symmetry across features is quite pleasing, and the selection of these "physical motion" features across all the tested pairs tends to confirm their validity. The fact that the currency strength indicator was not selected does not mean that this indicator is of no value, but perhaps it should not be used for regression purposes, but rather as a filter. More in due course.

## Tuesday, 7 February 2017

### Update on Currency Strength Smoothing, and a new direction?

Since my last two posts ( currency strength indicator and preliminary tests thereof ) I have been experimenting with different ways of smoothing the indicators without introducing lag, mostly along the lines of using an oscillator leading signal plus various schemes to smooth and compensate for introduced attenuation and making heavy use of my particle swarm optimisation code. Unfortunately I haven't found anything that really works to my satisfaction and so I have decided to forgo any further attempts at this and just use the indicator in its unsmoothed form as neural net input.

In the process of doing the above work I decided that my particle swarm routine wasn't fast enough and I started using the BayesOpt optimisation library, which is written in C++ and has an interface to Octave. Doing this has greatly decreased the time I've had to spend in my various optimisation routines and the framework provided by the BayesOpt library will enable more ambitious optimisations in the future.

Another discovery for me was this Predicting Stock Market Prices with Physical Laws paper, which has some really useful ideas for neural net input features. In particular I think the idea of combining position, velocity and acceleration with the ideas contained in an earlier post of mine on Savitzky Golay filter convolution and using the currency strength indicators as proxies for the arbitrary sine and cosine waves function posited in the paper hold some promise. More in due course.

## Tuesday, 8 November 2016

### Preliminary Tests of Currency Strength Indicator

Since my last post on the currency strength indicator I have been conducting a series of basic randomisation tests to see if the indicator has better than random predictive ability. The first test was a random permutation test, as described in Aronson's Evidence Based Technical Analysis book, the code for which I have previously posted on my Data Snooping Tests Github page. These results were all disappointing in that the null hypothesis of no predictive ability cannot be rejected. However, looking at a typical chart ( repeated from the previous post but colour coded for signals )
it can be seen that there are a lot of green ( no signal ) bars which, during the randomisation test, can be selected and give equal or greater returns than the signal bars ( blue for longs, red for shorts ). The relative sparsity of the signal bars compared to non-signal bars gives the permutation test, in this instance, low power to detect significance, although I am not able to show that this is actually true in this case.

In the light of the above I decided to conduct a different test, the .m code for which is shown below.
clear all ;

all_random_entry_distribution_results = zeros( 21 , 3 ) ;

tic();

for ii = 1 : 21

clear -x ii all_strengths_quad_smooth_21 all_random_entry_distribution_results ;

if ii == 1
mid_price = ( audcad_daily_bars( : , 3 ) .+ audcad_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 6 ; term_ix = 7 ;
end

if ii == 2
mid_price = ( audchf_daily_bars( : , 3 ) .+ audchf_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 6 ; term_ix = 4 ;
end

if ii == 3
mid_price = ( audjpy_daily_bars( : , 3 ) .+ audjpy_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 6 ; term_ix = 5 ;
end

if ii == 4
mid_price = ( audusd_daily_bars( : , 3 ) .+ audusd_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 6 ; term_ix = 1 ;
end

if ii == 5
mid_price = ( cadchf_daily_bars( : , 3 ) .+ cadchf_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 7 ; term_ix = 4 ;
end

if ii == 6
mid_price = ( cadjpy_daily_bars( : , 3 ) .+ cadjpy_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 7 ; term_ix = 5 ;
end

if ii == 7
mid_price = ( chfjpy_daily_bars( : , 3 ) .+ chfjpy_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 4 ; term_ix = 5 ;
end

if ii == 8
mid_price = ( euraud_daily_bars( : , 3 ) .+ euraud_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 2 ; term_ix = 6 ;
end

if ii == 9
mid_price = ( eurcad_daily_bars( : , 3 ) .+ eurcad_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 2 ; term_ix = 7 ;
end

if ii == 10
mid_price = ( eurchf_daily_bars( : , 3 ) .+ eurchf_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 2 ; term_ix = 4 ;
end

if ii == 11
mid_price = ( eurgbp_daily_bars( : , 3 ) .+ eurgbp_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 2 ; term_ix = 3 ;
end

if ii == 12
mid_price = ( eurjpy_daily_bars( : , 3 ) .+ eurjpy_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 2 ; term_ix = 5 ;
end

if ii == 13
mid_price = ( eurusd_daily_bars( : , 3 ) .+ eurusd_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 2 ; term_ix = 1 ;
end

if ii == 14
mid_price = ( gbpaud_daily_bars( : , 3 ) .+ gbpaud_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 3 ; term_ix = 6 ;
end

if ii == 15
mid_price = ( gbpcad_daily_bars( : , 3 ) .+ gbpcad_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 3 ; term_ix = 7 ;
end

if ii == 16
mid_price = ( gbpchf_daily_bars( : , 3 ) .+ gbpchf_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 3 ; term_ix = 4 ;
end

if ii == 17
mid_price = ( gbpjpy_daily_bars( : , 3 ) .+ gbpjpy_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 3 ; term_ix = 5 ;
end

if ii == 18
mid_price = ( gbpusd_daily_bars( : , 3 ) .+ gbpusd_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 3 ; term_ix = 1 ;
end

if ii == 19
mid_price = ( usdcad_daily_bars( : , 3 ) .+ usdcad_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 1 ; term_ix = 7 ;
end

if ii == 20
mid_price = ( usdchf_daily_bars( : , 3 ) .+ usdchf_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 1 ; term_ix = 4 ;
end

if ii == 21
mid_price = ( usdjpy_daily_bars( : , 3 ) .+ usdjpy_daily_bars( : , 4 ) ) ./ 2 ; mid_price_rets = [ 0 ; diff( mid_price ) ] ;
base_ix = 1 ; term_ix = 5 ;
end

% the returns vectors suitably alligned with position vector
mid_price_rets = shift( mid_price_rets , -1 ) ;
sma2 = sma( mid_price_rets , 2 ) ; sma2_rets = shift( sma2 , -2 ) ; sma3 = sma( mid_price_rets , 3 ) ; sma3_rets = shift( sma3 , -3 ) ;
all_rets = [ mid_price_rets , sma2_rets , sma3_rets ] ;

% delete burn in and 2016 data ( 2016 reserved for out of sample testing )
all_rets( 7547 : end , : ) = [] ; all_rets( 1 : 50 , : ) = [] ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% simple divergence strategy - be long the uptrending and short the downtrending currency. Uptrends and downtrends determined by crossovers
% of the strengths and their respective smooths
smooth_base = smooth_2_5( all_strengths_quad_smooth_21(:,base_ix) ) ; smooth_term = smooth_2_5( all_strengths_quad_smooth_21(:,term_ix) ) ;
test_matrix = ( all_strengths_quad_smooth_21(:,base_ix) > smooth_base ) .* ( all_strengths_quad_smooth_21(:,term_ix) &lt; smooth_term) ; % +1 for longs
short_vec = ( all_strengths_quad_smooth_21(:,base_ix) &lt; smooth_base ) .* ( all_strengths_quad_smooth_21(:,term_ix) > smooth_term) ; short_vec = find( short_vec ) ;
test_matrix( short_vec ) = -1 ; % -1 for shorts
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% delete burn in and 2016 data
test_matrix( 7547 : end , : ) = [] ; test_matrix( 1 : 50 , : ) = [] ;
[ ix , jx , test_matrix_values ] = find( test_matrix ) ;
no_of_signals = length( test_matrix_values ) ;

% the actual returns performance
real_results = mean( repmat( test_matrix_values , 1 , size( all_rets , 2 ) ) .* all_rets( ix , : ) ) ;

% set up for randomisation test
iters = 5000 ;
imax = size( test_matrix , 1 ) ;
rand_results_distribution_matrix = zeros( iters , size( real_results , 2 ) ) ;

for jj = 1 : iters
rand_idx = randi( imax , no_of_signals , 1 ) ;
rand_results_distribution_matrix( jj , : ) = mean( test_matrix_values .* all_rets( rand_idx , : ) ) ;
endfor

all_random_entry_distribution_results( ii , : ) = ( real_results .- mean( rand_results_distribution_matrix ) ) ./ ...
( 2 .* std( rand_results_distribution_matrix ) ) ;

endfor % end of ii loop

toc()

save -ascii all_random_entry_distribution_results all_random_entry_distribution_results ;

plot(all_random_entry_distribution_results(:,1),'k','linewidth',2,all_random_entry_distribution_results(:,2),'b','linewidth',2,...
all_random_entry_distribution_results(:,3),'r','linewidth',2) ; legend('1 day','2 day','3 day');
What the code basically does is construct null hypothesis distributions of 1, 2 and 3 day returns of n random entries, where n is the same number of signal bars -1 or +1 as the currency strength indicator signal. The signal returns are then plotted as a line chart of the distance between random return means and signal return means normalised by 2x the random return standard deviations. In this way values >1 approximately correspond to p values < 0.05. Two typical charts are shown below

The first chart shows the results of the unsmoothed currency strength indicator and the second the smoothed version. From this I surmise that the delay introduced by the smoothing is/will be detrimental to performance and so for the nearest future I shall be working on improving the smoothing algorithm used in the indicator calculations.