Introduction
In order to conduct a meaningful review of how the various portfolio systems were developed I have to try to get back into the mindset I had last summer when the bulk of the work was done. That’s not proving that easy, to be honest. I can’t remember my exact train of thought at certain points so can’t quite see why I did certain things and went down certain routes. So some of the details in this review may be sketchy as it seems I didn’t fully document all of what I was doing. That’s the first lesson learned from this review – make extensive notes as you never know when you might need them. I have found some notes and all my spreadsheets so it’s not too bad really but there are still a few assumptions in this review. Regardless, I am confident that this review will be able to highlight a number of reasons why the portfolio systems underperformed this season and should also highlight a few valuable system development lessons too.
System Development Process
I wasn’t really sure how best to present the process I used to develop these systems. I was torn between outlining the process in full and then discussing it or analysing each step of the process as I come to it. Hopefully you will get an idea of how I did what I did.
Before any system development could take place I needed some data to work with. I opted to use data from the four English divisions (Premiership, Championship, League One and Two) for a period of 10 seasons starting with the 2000/01 season. That’s a total of 20360 matches. I took the data from www.football-data.co.uk and cleaned it up a bit (for example I made all the team names consistent throughout the data) before use. I used a Microsoft Excel spreadsheet to store, manipulate and analyse the data. I was then ready to get started.
Step 1. Generate home/home, away/away, home/all and away/all six-game form and streaks data.
The data I had obtained from football-data was only a starting point. I used that data along with a few VBA macros to generate a huge volume of additional data for each match. These portfolio systems made use of the form and streaks data but a significant amount of other data was also generated and I hope to make good use of it in the future.
The form data was generated by examining results from a team’s last six games. It comprises the form string e.g. WWWDWL, with most recent result on the right and W equalling a win, D a draw and L a loss; the number of goals for and against; the number of points earned with three points awarded for a win and one for a draw; plus the number of games that have gone under/over 2.5 goals.
The streaks data counts the number of games a team has gone without a certain result or event happening, including: number of games without a win; games without a draw; games since last defeat; games without scoring a goal; number of games without conceding a goal; games since a match that ended with under 2.5 goals and number of games since a result that was over 2.5 goals.
For each data category (form, streaks etc) I generated four sets of data: home/home, away/away, home/all and away/all. Home/home data relates to matches the home team has played at their home ground, away/away is data from away matches played by the away team whereas home/all and away/all refer to games played by the home and away side respectively regardless of whether they were played at home or away. This is perhaps made clearer with an example.
Take the final match of the 2009/10 Premiership season – Wolves v Sunderland – as our example. For this fixture, as with all fixtures, we want to generate data across the various data categories based on previous results. This data will be divided into four sets, each using different results. The home/home data will use data from the home team’s (Wolves’) recent games at Molineux. Thus the six-game home/home form data will be based on Wolves matches at home to Tottenham, Chelsea, Man United, Everton, Stoke and Blackburn. And you can check the fixture lists for that season to confirm they were Wolves’ opponents prior to the season closer against Sunderland. The six-game away/away form data will be based on Sunderland’s six most recent away games prior to their visit to Wolves. That means their trips to Portsmouth, Arsenal, Aston Villa, Liverpool, West Ham and Hull. The home/all form data would be based on the last six games Wolves played regardless of where they took place, i.e. Wolves v Everton, Arsenal v Wolves, Wolves v Stoke, Fulham v Wolves, Wolves v Blackburn and Portsmouth v Wolves. Similarly the away/all form data uses Sunderland’s last six games, which were Liverpool v Sunderland, Sunderland v Tottenham, West Ham v Sunderland, Sunderland v Burnley, Hull v Sunderland and Sunderland v Man United.
Step 2. Analyse form and streak data looking at more than 100 form and 50 streak potential system ideas for each of the four sets of data, with the summary performance for home wins, draws, away wins, under 2.5 goals and over 2.5 goals recorded.
Having generated all the necessary date the next step is to start analysing it. This was done in a methodical manner, working through the four form-based data sets (home/home, away/away etc.) before tackling the streak-based data. For each of the four form-based data sets more than 100 system ideas were identified and analysed. These included teams drawing their last match, earning at least X points in their last six games, conceding no more than X goals in their previous six games, the goal difference of their last six games exceeding X, the total number of goals scored and conceded in the last six games being under X and so on. For each potential system idea a theoretical 1pt bet was placed on the home win, draw, away win, under 2.5 goals and over 2.5 goals whenever a match met the relevant selection criteria and the number of qualifiers, strike rate and ROI for each were noted. I found the files containing this information so I did at least document some of what I was doing – phew!
A similar process was then used for the streak data, with approximately 50 potential systems being analysed for each of the four data sets. These included teams being unbeaten in their last X matches, teams having gone X games without a draw and so on. As with the form data the potential returns from theoretical 1pt bets were recorded so one could soon see, for example, whether it was worth backing the home win when the home team were unbeaten in their last five games at home or whether the away team might be trading at value odds.
With over 100 system ideas for the four form-based data sets and a further 50 for each of the streak-based data sets and the returns from home wins, draw, away wins, under 2.5 goals and over 2.5 goals recorded this meant in excess of 3000 potential systems to be analysed.
Step 3. Analyse and optimise above 3000+ system ideas using month, division and odds filters
It was rare for any of the 3000+ system ideas that came out of step 2 to be profitable in the raw form they were in at this stage so each was subjected to an analysis and optimisation process. The spreadsheet I had crafted provided me with a full breakdown of each potential system, splitting the data by season, month, division and odds. The seasonal split simply gave me an idea whether a couple of freak years were providing the vast majority of the profits or the trend generated consistent profits through the years. The month, divisions and odds splits were used to filter the data with a view to improving the profitability
The three filter types were used in combination in order to optimise the trend. After applying each filter I would re-analyse the data, generating a new set of data for each of the various breakdowns so that I could see the effect of each filter in isolation. For example, the initial breakdowns may indicate that the system is particularly profitable in the Premiership and when the home team is odds on. I would apply the Premiership filter and re-examine the breakdowns for the filtered data as the odds brackets that looked profitable before may not stand out as much having filtered the data.
Each potential system was analysed to provide a breakdown of the performance by month, division and odds range. It was rare for any of the systems to be profitable in the raw form they were in at the end of the previous step, certainly not over any significant number of bets anyway. However, a few filters could soon rectify that and this step revealed numerous profitable little trends. By splitting the overall figures down so I could see the number of bets, strike rate and ROI for each month, division and odds range I could start to mine for areas I felt I could exploit for profit.
The available month and divisional filters were obviously the same for all systems - trends could be filtered down into bets during August, September etc as well as by Premiership, Championship, League One and League Two. The odds filters depended on the market in question. For example, for home wins my lowest price bracket was ‘under 1.50′ but there would be no point having such a bracket for draws which are always 2/1 or greater. A total of eight price brackets were employed for each of the 1×2 markets with the brackets varying in size depending on how much granularity I wanted from the analysis. Odds filters were not employed in the analysis of the under/over 2.5 goals bets due to the fact there is so little variation in the odds.
When filtering these potential system ideas I tried to ask myself why a trend would be profitable in some cases and not others. For example, why should a trend be profitable in some months and not others? It could be that the form needs to settle down meaning a system isn’t necessarily profitable at the start of the season. A trend may only be profitable in the top divisions due to the higher quality of the teams and football played (although I have to admit I don’t really buy that one any more). Is there any reason why short-odds selections do worse than the longshots? Perhaps bookies are shortening so-called banker teams like Manchester United and Chelsea knowing that casual punters will back them whatever odds they are which means the opposition may be available at fantastic odds.
I say I tried to ask questions like those above during the filtering process and as far as I can recall I was reasonably diligent. I’m pretty sure I didn’t note down all the profitable trends I identified but in hindsight I should have been more strict with this idea and questioned the filters applied in many more cases. Some of these filtered trends seem pretty hard to justify now that I look at them again.
During this filtering and optimisation process I tried to ensure that the resultant samples were not too small and that the number of bets remained significant. I was trying to avoid the situation whereby a profitable trend was identified but it only throws up a bet once every blue moon as that doesn’t suit anyone. Such systems have a massive risk of a bet being missed (either I miss posting it or you miss backing it) and inevitably that will be a winner and the next one won’t be along for ages. I was trying to strike a balance between a ridiculously heavy workload and a stupidly small number of bets. I am not sure I succeeded in all cases.
Step 4. Best trends packaged into first draft portfolio systems
At the end of the previous step I had identified a large number of profitable trends. My plan was to package these up into a number of portfolio systems. It was always my intention to develop a number of systems with each having a different focus. I wasn’t sure at the start of this process how many systems I was aiming for, I was waiting to see how many the data threw up really. But by this stage of the process it was obvious that I should have separate systems for each of the bet types in the 1×2 and under/over markets. I also wanted to keep the form-based data and streak-based data separate so I could monitor the effectiveness of them individually. I didn’t have enough in the way of suitable streak-based trends to separate out all options in the 1×2 market so I went for a combined system there and similarly with the under/over 2.5 goals market.
I began to group the various trends into several categories in order to form the first draft of the portfolio systems. All the form-based trends that selected home wins were put into one portfolio system, all the form-based draw trends in another and so on. The number of trends included in each portfolio system at this point varied significantly. The first draft Home Win Form Portfolio System contained 25 different trends while the Under Form Portfolio System comprised 65 unique trends. This meant each portfolio system varied significantly in terms of number of bets, strike rate and profit but that wasn’t my concern at this stage.
As I had this portfolio system approach in mind it perhaps influenced some of the work done in the previous step and meant that some trends made that cut that perhaps shouldn’t have done. Knowing that I was going to bundle together several trends I might have continued to develop some very-specific trends that wouldn’t stand up by themselves knowing they would be hidden away in a portfolio with other systems/trends for support. For example, I worked on some trends that were only profitable at the start of the season knowing they could be packaged with others that were profitable at other points throughout the year. In effect I generated some systems that completely changed in nature depending on the month.
The portfolio system approach is one I will use in the future as I think there is great value in it but one needs to be quite careful how it is applied. I think the key is to first develop a number of individual systems that are all profitable in their own right, but also systems that one would be happy to follow on their own. It’s that last condition I failed to meet in some cases. The advantage of portfolio systems is that they can help smooth out the various ups and downs. But if any of the constituent parts fail it can be difficult to spot that as early as one would if following each system separately.
Step 5. Portfolio systems optimised by adding/removing individual trends
I said above that the first draft portfolio systems varied greatly in terms of workload and profitability. This step was to address that issue. I went into it with a vague idea of the profile I was aiming for. I wanted to avoid systems that generated little action so wanted each of the various portfolio systems I was building to provide a good number of bets each season. And I obviously wanted a decent rate of return. On that front a return on investment of 10% was my absolute basement figure and I was aiming for closer to 15% or even 20%. What’s more I wanted a steady accumulation of profits rather than the bulk of them coming in a short period. With this image of a ‘perfect’ system in mind I could start to optimise each of the portfolios.
The first step in the optimisation process was to obtain statistical breakdowns (by season, month, division and odds range as in Step 3) for each of the individual trends in the portfolio. This was a simple process that naturally dropped out of the previous steps. Once I had all this information I used my spreadsheets to toggle each individual trend on and off to see what impact it had on the overall figures as well as what changes it made to the profit accumulation graph. I was looking for a combination of a number of trends to give me the desired number of bets per season, the right sort of level of return and the steady profit accumulation I was after.
At this stage it didn’t really matter which of the individual trends I was including and which were omitted. I wasn’t really paying any attention to how the individual trend rules would combine to form the parent portfolio system. Take the Over Form Portfolio System as an example. In August the bets were being selected by trends based on things like: goal difference in all divisions except the Championship; away teams in the Premiership having drawn their last two matches and the number of goals scored by the away side. In hindsight it seems like an odd set of rules doesn’t it? When combined to form the parent system they don’t seem to make any sense together or complement one another. My focus at this stage was simply getting the portfolio system to meet the desired profile. Looking back I can see I was too obsessed with developing that ‘perfect’ system that delivered smooth, consistent profits that I took my eye off the ball when it came to the actual make-up of the system.
Step 6. Live trial on blog
The final step in the process was to conduct a live trial of all portfolio systems. Backtesting is obviously a good idea in order to get some idea of the likely performance but live testing is an essential part of any strategy. How else can you know if you idea will work in the real world or not? So I set this blog up and put together the spreadsheets I needed to make sorting out the qualifiers easier and started posting.
We’ve already seen that for the most part the live trial didn’t work in the slightest. The systems failed catastrophically and a couple of them recorded heavy losses. I hope none of you got your fingers burned too much. I did say right from the outset that this was a live trial and the first time these systems had been subjected to real data rather than backtesting. I followed the systems to small stakes up until January so I have literally paid the price for a flawed development process.
Conclusions
Having reviewed the system development process I think it’s fair to say there are a number of flaws in there, many of which I have already identified. I may not have spotted every weakness in my work but I am sure that addressing the big issues I have picked out will greatly improve the quality of future work. I certainly don’t feel that the whole process was flawed though. There are some steps that I would carry through to future developments lock, stock and barrel. But there are also other steps I would never use again in their current form. The key is to recognise which is which and learn from this experience.
I think the problems started during the analysis and optimisation of the individual system trends. Some of the trends were too specific and despite writing warnings about the dangers of back-fitting throughout what documentation exists for this work I still fell into that trap somewhat, e.g. one trend I used was home team drew last match, month is November and home win odds (for next match) are greater than evens. OK, if team drew last match then perhaps that will affect their win odds next time out but why should this trend work in November but not October, December or any other point of the season?
Shortly before I started this system development process I had been reading about the effects of the weather on football results. The articles concerned the average number of goals per game during sunny and rainy periods. I found it interesting stuff and began to do a little work of my own in this area. However, I think I got carried away somewhat and made far too many assumptions when it came to developing the portfolio systems. Take the Under Form Portfolio and Over Form Portfolio as examples. The former has very few bets in the first few months of the season and really ramps up the activity from January onwards while the latter is more or less the opposite. Poor weather leads to fewer bets on average but this idea has been taken to the extreme somewhat here. I have made general assumptions about the weather and applied them to the development process. A certain under 2.5 goals trend is profitable in January and February because the weather is lousy, right? Perhaps, but is the weather in December not also crappy? Why isn’t the trend profitable then too?
I stated earlier that I tried to question why trends should be profitable only when certain filters are applied. As you have just seen my attempts at justifying some of those filters now look quite flaky and were I (or someone else) to go back over these trends I doubt many of them would stand up to scrutiny. Filtering the trends by month is perhaps the greatness weakness of this work. To some extent this was justified on the basis of weather as I mentioned above but that’s not a solid enough reason for such filtering. The divisional filtering is largely based on the ability of the average player in that division and the class of football played but I no longer feel the difference is that significant. However, there is also the fact that bookies see a much greater turnover on Premierships matches than lower leagues so the odds compilers look to get the Premiership markets spot on leaving less time for other divisions. That may partly explain why some trends are profitable in certain divisions and not others. That said I would expect the filters to take the form of ‘Premiership only’, ‘lower divisions only’ and not ‘all divisions except the Championship’ as was the case for some of my trends.
I was blind to some of the weaknesses of the filtering process as the individual trends would be packaged into portfolio systems. I think that at the time I had the view that the weaknesses of the individual trends would be glossed over because other trends would be generating profits. The portfolio system idea seemed like a magic bullet with few drawbacks. Obviously that’s not the case and I have already mentioned that I now feel the individual trends forming any portfolio system must also be capable of standing alone. I identified a large number of profitable little trends but many of them were far too specific to stand alone so the danger of backfitting becomes very real. I found it hard it throw most of these trends away. I should have used the gardening principle of thinning down to only the strongest but I didn’t.
I was also striving too hard to develop a system that fitted my view of the perfect system. I wanted the number of bets, strike rate and ROI all to fall into a certain range of values and that drove the development far more than it should have done. More than that I was seeking a smooth profit curve and tried, where possible, to balance the number of bets per month too. There is an air of the tail wagging the dog about this. At the time I had it at the back of my mind that I shouldn’t be including/excluding trends just to balance the profits but I still did it. This is gambling and profits generally don’t come along at a steady rate. There are peaks and troughs, winning and losing runs, good and bad spells. They are part of the game and you can’t just smooth them out, not here in the real world.
Where do we go from here? Obviously the systems can’t run again as they are, not knowing what I do about the development process. How can anyone expect the product of a flawed development process to work? That said, appearances can be deceiving. The Over Form Portfolio looks like it went exactly to plan in League One this season but you have to ask yourself why it worked in that division and not others. I’m pretty sure this season was a fluke and were you to follow the Over Form Portfolio in League One next season there is certainly no guarantee it will perform to anything like the same standard. You have to be careful how much you read into results breakdowns. Anyway, I will retire this current suite of systems and go back to the drawing board. Whether I will have any offerings for next season remains to be seen. I hope to have something else to trial but I’m not sure whether I will have the time to develop something. One thing is for sure though, I have learned several lessons this season so hopefully I can reap the benefits over the coming seasons.