WEBVTT

1
00:00:00.020 --> 00:00:00.180
Hello,

2
00:00:00.300 --> 00:00:02.062
welcome back to Papers with Backtest podcast.

3
00:00:02.363 --> 00:00:05.023
Today we dive into another Algo trading research paper.

4
00:00:05.145 --> 00:00:05.344
Yeah,

5
00:00:05.645 --> 00:00:06.766
looking forward to this one.

6
00:00:06.906 --> 00:00:07.027
So,

7
00:00:07.246 --> 00:00:07.488
you know,

8
00:00:07.848 --> 00:00:08.852
for hedge fund managers,

9
00:00:09.391 --> 00:00:10.828
it's this constant grind,

10
00:00:10.867 --> 00:00:11.172
isn't it?

11
00:00:11.352 --> 00:00:12.773
The pressure to beat the benchmark.

12
00:00:12.836 --> 00:00:12.953
Oh,

13
00:00:12.992 --> 00:00:13.492
absolutely.

14
00:00:13.891 --> 00:00:15.352
Fall behind and investors,

15
00:00:15.453 --> 00:00:15.695
well,

16
00:00:16.078 --> 00:00:17.297
they might start looking elsewhere.

17
00:00:17.625 --> 00:00:18.391
It's tough out there,

18
00:00:18.859 --> 00:00:26.344
especially if you lean towards the efficient market hypothesis that beating the market consistently is basically impossible,

19
00:00:26.453 --> 00:00:27.188
or at least very,

20
00:00:27.266 --> 00:00:27.875
very difficult.

21
00:00:28.125 --> 00:00:28.328
Right.

22
00:00:28.444 --> 00:00:29.905
Which leads to the big question,

23
00:00:30.566 --> 00:00:32.427
where do you find that edge,

24
00:00:32.907 --> 00:00:33.470
that alpha?

25
00:00:33.630 --> 00:00:34.111
Exactly.

26
00:00:34.572 --> 00:00:35.431
And this paper,

27
00:00:35.712 --> 00:00:38.931
it digs into alternative data as maybe one answer for the pros.

28
00:00:39.837 --> 00:00:40.517
Alternative data.

29
00:00:40.853 --> 00:00:44.056
So we're talking about information that's unique,

30
00:00:44.337 --> 00:00:47.009
not your standard market data fees you get everywhere.

31
00:00:47.165 --> 00:00:47.368
Okay,

32
00:00:47.415 --> 00:00:49.087
so not just stock prices and P.E.

33
00:00:49.181 --> 00:00:49.587
ratios.

34
00:00:49.884 --> 00:00:50.009
No,

35
00:00:50.228 --> 00:00:51.040
think outside the box,

36
00:00:51.134 --> 00:00:52.212
like satellite images,

37
00:00:52.275 --> 00:00:54.009
maybe tracking cars in parking lots,

38
00:00:54.493 --> 00:00:56.290
or sensor data from supply chains.

39
00:00:56.493 --> 00:00:56.884
Interesting.

40
00:00:56.885 --> 00:00:57.009
The.

41
00:00:57.088 --> 00:00:57.588
But this paper,

42
00:00:57.728 --> 00:01:03.170
it really zooms in on web data because it's just vast and so diverse.

43
00:01:03.568 --> 00:01:04.049
Web data.

44
00:01:04.549 --> 00:01:04.748
Right.

45
00:01:04.850 --> 00:01:06.408
So company websites,

46
00:01:06.447 --> 00:01:07.752
product prices online,

47
00:01:08.432 --> 00:01:08.869
forums,

48
00:01:08.947 --> 00:01:09.447
news sites.

49
00:01:09.627 --> 00:01:09.815
Yeah.

50
00:01:10.127 --> 00:01:11.650
It's potentially huge.

51
00:01:12.049 --> 00:01:12.471
It is.

52
00:01:12.572 --> 00:01:15.213
And apparently investors are noticing big time.

53
00:01:15.713 --> 00:01:22.572
The paper mentions projections over $7 billion being invested in alt data and the tech to handle it by 2020.

54
00:01:22.932 --> 00:01:23.807
$7 billion.

55
00:01:24.057 --> 00:01:24.369
Wow.

56
00:01:25.041 --> 00:01:26.682
That's serious commitment.

57
00:01:26.683 --> 00:01:26.791
it.

58
00:01:26.876 --> 00:01:28.398
people really believe there's value there.

59
00:01:28.498 --> 00:01:28.898
Definitely.

60
00:01:29.078 --> 00:01:36.004
The core idea really is that digging into this non-traditional data can uncover insights you just wouldn't get otherwise.

61
00:01:36.027 --> 00:01:36.269
You know,

62
00:01:36.347 --> 00:01:39.168
find things before they show up in regular financial reports.

63
00:01:39.332 --> 00:01:41.089
Giving you a jump on the market potentially.

64
00:01:41.293 --> 00:01:41.871
That's the goal.

65
00:01:41.894 --> 00:01:43.925
It's about generating that unique alpha,

66
00:01:44.113 --> 00:01:45.191
that performance edge.

67
00:01:45.519 --> 00:01:50.457
Sophisticated investors see it as a way to really complement their traditional analysis.

68
00:01:50.675 --> 00:01:50.879
Okay.

69
00:01:50.880 --> 00:01:51.863
So let's talk specifics.

70
00:01:52.238 --> 00:01:55.972
How does this messy web data actually turn into like trading rules?

71
00:01:56.436 --> 00:01:59.059
The paper mentions aggregating corporate operational data.

72
00:01:59.199 --> 00:01:59.959
What does that look like?

73
00:02:00.020 --> 00:02:00.160
Well,

74
00:02:00.240 --> 00:02:02.463
imagine scraping data continuously from,

75
00:02:02.561 --> 00:02:02.744
say,

76
00:02:03.404 --> 00:02:05.143
job boards across an entire industry.

77
00:02:05.244 --> 00:02:05.986
Not just one company,

78
00:02:06.049 --> 00:02:06.908
but all its competitors,

79
00:02:06.947 --> 00:02:07.127
too.

80
00:02:07.424 --> 00:02:11.330
If you suddenly see one company ramping up hiring way more than others,

81
00:02:12.033 --> 00:02:16.478
that could be an early signal maybe of growth or a new project kicking off.

82
00:02:16.650 --> 00:02:16.869
Ah,

83
00:02:16.963 --> 00:02:17.291
I see.

84
00:02:17.416 --> 00:02:20.822
So a potential trading signal based on aggregated hiring trends.

85
00:02:20.853 --> 00:02:21.307
Exactly.

86
00:02:21.400 --> 00:02:23.666
It's about spotting those relative changes,

87
00:02:23.728 --> 00:02:25.447
those anomalies derived from the web.

88
00:02:25.648 --> 00:02:26.830
What about price monitoring?

89
00:02:27.650 --> 00:02:30.474
The paper mentions tracking prices and inventory globally.

90
00:02:30.912 --> 00:02:31.775
How's that different from just,

91
00:02:31.853 --> 00:02:32.095
you know,

92
00:02:32.193 --> 00:02:33.056
inflation stats?

93
00:02:33.134 --> 00:02:35.076
It's much more granular in real time.

94
00:02:35.201 --> 00:02:43.099
Think about tracking the online price of a specific semiconductor or lumber prices on supplier websites worldwide.

95
00:02:43.287 --> 00:02:43.506
Right.

96
00:02:43.771 --> 00:02:50.615
If you see prices for a key component spiking up consistently and maybe inventories dropping at the same time,

97
00:02:50.990 --> 00:02:53.849
that could signal supply issues or rising costs.

98
00:02:54.012 --> 00:02:55.912
before it hits the company's earnings report.

99
00:02:56.452 --> 00:02:58.374
It points towards potential margin pressure.

100
00:02:58.675 --> 00:03:00.155
So a trading rule might be what?

101
00:03:00.913 --> 00:03:01.753
Shorting companies,

102
00:03:01.952 --> 00:03:05.733
heavily reliant on that component if the price crosses a certain threshold.

103
00:03:05.913 --> 00:03:06.413
Precisely.

104
00:03:06.679 --> 00:03:09.616
Or maybe going long on the suppliers if they seem to have pricing power.

105
00:03:10.014 --> 00:03:14.061
It's about translating those real-time web signals into actionable trade ideas.

106
00:03:14.436 --> 00:03:15.702
Then there's sentiment analysis.

107
00:03:16.171 --> 00:03:16.858
Using AI,

108
00:03:17.030 --> 00:03:17.811
machine learning,

109
00:03:18.124 --> 00:03:18.452
NLP,

110
00:03:19.077 --> 00:03:21.936
all that stuff to gauge the buzz online.

111
00:03:22.155 --> 00:03:22.296
Yeah,

112
00:03:22.374 --> 00:03:22.733
this one's...

113
00:03:23.096 --> 00:03:23.657
Fascinating,

114
00:03:23.677 --> 00:03:25.679
but also potentially tricky.

115
00:03:26.018 --> 00:03:26.339
How so?

116
00:03:26.700 --> 00:03:26.919
Well,

117
00:03:27.141 --> 00:03:28.763
the idea is simple enough.

118
00:03:29.602 --> 00:03:32.247
Track what people are saying online about a stock or product.

119
00:03:32.942 --> 00:03:37.231
Use AI to figure out if the chatter is positive or negative and how intense it is.

120
00:03:37.427 --> 00:03:37.591
Okay.

121
00:03:37.911 --> 00:03:40.895
A big sustained surge in negative sentiment,

122
00:03:40.989 --> 00:03:42.380
especially if it seems credible,

123
00:03:43.083 --> 00:03:44.911
might predict the stock price heading down.

124
00:03:45.442 --> 00:03:45.786
Might?

125
00:03:46.020 --> 00:03:46.161
Yeah,

126
00:03:46.442 --> 00:03:47.380
might is the key word.

127
00:03:47.817 --> 00:03:49.364
Online sentiment can be noisy,

128
00:03:49.677 --> 00:03:51.130
easily manipulated sometimes.

129
00:03:51.224 --> 00:03:51.630
So you need

130
00:03:52.088 --> 00:03:53.489
Really robust filters.

131
00:03:53.490 --> 00:03:54.930
You can't just trade on every tweet,

132
00:03:54.991 --> 00:03:55.452
obviously.

133
00:03:55.571 --> 00:03:55.712
Right.

134
00:03:55.713 --> 00:03:57.473
You need to separate the signal from the noise.

135
00:03:57.692 --> 00:03:58.095
Makes sense.

136
00:03:58.735 --> 00:03:58.934
Okay.

137
00:03:59.055 --> 00:04:00.720
So you develop these potential rules,

138
00:04:01.274 --> 00:04:02.118
hiring trends,

139
00:04:02.399 --> 00:04:03.118
price spikes,

140
00:04:03.837 --> 00:04:04.657
sentiment shifts.

141
00:04:05.063 --> 00:04:08.602
How do you know if they actually work before risking real capital?

142
00:04:09.024 --> 00:04:10.962
That's where backtesting is absolutely crucial.

143
00:04:11.243 --> 00:04:13.384
You take your rule and run it against historical data.

144
00:04:13.649 --> 00:04:15.134
See how it would have performed in the past.

145
00:04:15.337 --> 00:04:18.477
The paper mentions some key metrics for evaluating that performance.

146
00:04:18.540 --> 00:04:19.149
Not just profit,

147
00:04:19.227 --> 00:04:19.415
right?

148
00:04:19.743 --> 00:04:21.102
Definitely not just profit.

149
00:04:21.348 --> 00:04:38.451
or absolute return as they call it you need context that's where alpha comes in did your strategy actually beat its benchmark did it provide an edge so alpha is the secret sauce measure kind of yeah then there's beta how much did your strategy move with the overall market you

150
00:04:38.529 --> 00:04:49.232
need to understand the systemic risk you took okay and standard deviation tells you about volatility how bumpy was the ride was it a smooth gain or wild swings right risk matters hugely

151
00:04:49.496 --> 00:04:54.079
Which leads to risk-adjusted metrics like the Sharpe ratio return versus total risk.

152
00:04:54.220 --> 00:04:55.220
And the Sortino ratio,

153
00:04:55.282 --> 00:04:58.103
which is similar but focuses specifically on downside risk,

154
00:04:58.462 --> 00:04:59.626
the bad kind of volatility.

155
00:04:59.923 --> 00:05:00.368
Sortino.

156
00:05:00.767 --> 00:05:01.228
I like that.

157
00:05:01.704 --> 00:05:02.907
Only penalizes for losses.

158
00:05:03.032 --> 00:05:03.571
Exactly.

159
00:05:03.970 --> 00:05:09.017
And R-squared tells you how much of your performance was just the market moving versus something unique your strategy did.

160
00:05:09.313 --> 00:05:09.829
And critically,

161
00:05:09.938 --> 00:05:11.985
you need a relevant benchmark for comparison.

162
00:05:12.708 --> 00:05:13.008
Always.

163
00:05:13.269 --> 00:05:18.594
So a whole suite of metrics to judge if a web data strategy holds water based on past data.

164
00:05:18.734 --> 00:05:18.973
Yes.

165
00:05:19.356 --> 00:05:20.754
It's about rigorous evaluation.

166
00:05:21.113 --> 00:05:23.176
But even that depends heavily on one more thing.

167
00:05:23.816 --> 00:05:24.496
Data quality.

168
00:05:25.059 --> 00:05:25.520
Garbage in,

169
00:05:25.621 --> 00:05:26.723
garbage out right.

170
00:05:26.981 --> 00:05:27.106
Ah,

171
00:05:27.145 --> 00:05:27.324
yeah.

172
00:05:27.606 --> 00:05:31.168
If the web data you scraped was wrong or incomplete.

173
00:05:31.387 --> 00:05:33.121
Then your backtest results are meaningless.

174
00:05:33.231 --> 00:05:33.934
Worse than useless,

175
00:05:34.027 --> 00:05:34.324
actually,

176
00:05:34.371 --> 00:05:35.949
because they give false confidence.

177
00:05:36.246 --> 00:05:37.277
So you need clean,

178
00:05:37.574 --> 00:05:38.887
reliable data history.

179
00:05:39.168 --> 00:05:40.481
The paper mentions an audit trail.

180
00:05:40.820 --> 00:05:41.680
Absolutely essential.

181
00:05:41.820 --> 00:05:43.981
You need to be able to track where your data came from,

182
00:05:44.401 --> 00:05:45.341
how it was processed,

183
00:05:45.380 --> 00:05:46.583
and be able to recreate it.

184
00:05:46.981 --> 00:05:49.239
It ensures integrity and reproducibility.

185
00:05:49.723 --> 00:05:52.684
If you can't trust the data underlying the backtest,

186
00:05:52.700 --> 00:05:53.739
you can't trust the strategy.

187
00:05:54.200 --> 00:05:54.981
Makes perfect sense.

188
00:05:55.325 --> 00:05:55.442
So,

189
00:05:55.661 --> 00:05:56.622
it's quite a process then.

190
00:05:57.200 --> 00:05:58.200
Find the right web data,

191
00:05:58.700 --> 00:05:59.950
figure out a smart trading rule,

192
00:06:00.184 --> 00:06:02.481
backtest it rigorously using the right metrics,

193
00:06:02.887 --> 00:06:04.950
and make absolutely sure your data is solid.

194
00:06:05.106 --> 00:06:05.887
That's the path.

195
00:06:06.122 --> 00:06:07.387
It requires discipline,

196
00:06:07.606 --> 00:06:08.153
good tech,

197
00:06:08.512 --> 00:06:08.887
and...

198
00:06:09.592 --> 00:06:10.853
healthy dose of skepticism.

199
00:06:11.553 --> 00:06:16.977
But the potential payoff finding those unique insights from the web is why so many are investing heavily in it.

200
00:06:17.079 --> 00:06:21.626
It really highlights how the search for alpha is pushing into new complex territories.

201
00:06:21.821 --> 00:06:22.602
Fascinating stuff.

202
00:06:22.641 --> 00:06:23.204
It really is.

203
00:06:23.243 --> 00:06:24.907
The digital footprint is just massive.

204
00:06:25.266 --> 00:06:27.923
Thank you for tuning in to Papers with Backtest podcast.

205
00:06:28.173 --> 00:06:30.532
We hope today's episode gave you useful insights.

206
00:06:30.907 --> 00:06:32.938
Join us next time as we break down more research.

207
00:06:33.360 --> 00:06:34.751
And for more papers and backtests,

208
00:06:34.782 --> 00:06:37.813
find us at https.paperswithbacktest.com.

209
00:06:38.293 --> 00:06:38.956
Happy trading.

210
00:06:38.997 --> 00:06:39.560
Happy trading.

