March 12, 2008

Money:Tech

I have been shamed into going back and putting something new on this blog. Attending the O'Reilly Money:Tech conference in NY in Feb 08, and seeing rampant real-time blogging was a wake up call.

An usual conference, with a mix of nerds and Wall Street sorts, and more than a few hybrids such as yours truly. The presentations from Money:Tech are here.

April 10, 2007

Stupid Data Miner Tricks

This collective web thing actually works. I got an email from ace quant investment manager John Bogle who'd seen a post from  Paul Kedrosky.  Both were looking for a copy of Stupid Data Miner tricks, a paper in the current Journal of Investing.

The JoI is not fully onboard the "information wants to free" train, so as a good citizen of the interweb series of tubes, I'm depositing an earlier version right here. Download dataminejune_2000.pdf

Here's the introduction:

Disraeli's warning that "there are three kinds of lies: lies, damn lies and statistics"  is particularly true when too much computation is applied to too little data. This paper presents some egregious yet instructional examples of data mining, and describes ways to avoid similar mishaps.

It started out as a set of joke slides showing silly spurious correlations over ten years ago. These statistically appealing relationships between the stock market and diary products and third world livestock populations have been cited often, in Business Week, the Wall Street Journal, the book “A Mathematician Looks at the Stock Market”, and many others. Students from Bill Sharpe’s classes at Stanford seem to be familiar with them. This was expanded, to have some actual content about data mining, and reissued as an academic working paper in 2001. Occasional requests for this arrive from distant corners of the world. So I’d like to thank the editors of the Journal of Trading for publishing this.

Without taking a hatchet to the original, the advice here is still valuable, perhaps more so, now that there is so much more data to mine. Monthly data arrives as one data point, once a month. It’s hard to avoid data mining sins if you look twice. Ticks, quotes, and executions arrive in millions per minute, and many of the practices which fail the statistical sniff tests for low frequency data can now be used responsibly. New frontiers in data mining have been opened up by the availability of vast amounts of textual information. Whatever raw material you choose, fooling yourself remains an occupational hazard in quantitative trading.


PS - DIY dataminers will want to check this: http://swivel.com/

 

March 22, 2007

Algo vs. Algo - With Pictures and Links


Algo vs. Algo is in the Febrauary 2007 issue of Institutional Investor Alpha Magazine
on pages 44-51.

Alas, the financial visualization examples could not be included in the print version. And copying long links is tedious, so I offered an electronic version with links and pictures.
It's downloadable here Download AlgosEdge0103.pdf

Here's a nice sample from Oculus, which will even move if the animated gif below works when you click the picture.

What you see (here or at the Oculus site) is a movie of the microstructure of the market for ORCL.  The limit order book shows sells in red or yellow and buys in blue and green. The heights of the "buildings" show size. The trade history recedes from the front on a visual conveyor belt, with the heights of those purple buildings corresponding to trade size. A striking feature is that when the bulk of buildings on the  buy or sell side of the book show a large imbalance, the price tends to move in that direction.

Hmm, maybe there is something to what they say about supply and demand.

Emvmoviefinal_3

This is suggestive of how a trader steering an algorithm might view the process. There would be overlays indicating which of the trades were yours, and the control parameters and levels for the algo.

February 17, 2007

Information Driven Price Moves - A Case Study

Download CaseStudy1.pdf

This is a micro-level trace of how information is disseminated and incorporated in financial markets in a era of great flux in news and digital “news-like” information. The case involves Accentia Biopharmaceuticals, which was the largest percentage gainer in US stock markets on the morning of Oct 19, 2006.

Several interesting issues raised here.

  • How should  investors and traders, and their information providers, should deal with disintermediated information?      
        
  • There is more news than people can handle, and far more distintermediated news. Some means beyond basic keyword search is needed
       
  • Are   examples like this one, which has language that would be of interest any      time, useful as the basis for “more like this” persistent search?
        
  • How to obtain the “meta-knowledge” (e.g. in this case, the relationship between      the vaccine name and the company that makes it) needed to interpret news      in a trading and investing context. What combination of automation and      human contributions are appropriate?

 

 

eInformation

<p><p><p><p><p><p><p><p><p><p><p><p><p><p><p>e-Information</p></p></p></p></p></p></p></p></p></p></p></p></p></p></p>     

    Technology has utterly reworked the "plumbing" of markets, dissemination of prices, orders, and clearing. A new frontier in financial technology is dealing with information used to make decisions  that arrives as text or web content, as eInformation

More on eInformation (a slight update of an essay from 2002, by Dave Leinweber and Peter Tufano)

A Transformation in Financial Information

Information is the raw material for any investment or trading strategy, and technology can radically alter the information landscape.  More than a century ago, a new technology - Transatlantic Telegraph Cables - had a major impact on the functioning of financial markets by suddenly bringing the prices of securities between New York and London in line with one another. The Internet may offer a similar revolution in communication and may influence financial markets. Recent years have witnessed the creation of new means by which information, opinions and analyses can be shared among investors. Information frictions are dramatically reduced. Information is impounded in prices much faster. Press releases used to be read by editors, now they are read by everyone. Local newspapers were read locally, now everything is global. Information sharing used to occur within small groups, now there are hundreds of millions of people reading and writing messages on the “Internet wall”. Since the late 90s, we have seen an astonishing growth of freely accessible information on the Net, an equally astonishing growth in the number of people able to act on this information. There is more and more information (and disinformation) available, and with over 20 million online brokerage accounts, it is impounded in prices at ever increasing speeds.  This transformation is dramatic.

The e-Information Project

Hundreds of articles on the informational efficiency of financial markets have been published over the past quarter century in major finance and accounting journals.  However the changes brought about by the net have created new opportunities to learn from empirical examination of the information in markets and the relationship of this information to the price formation process, to returns, to volatility, volume, and liquidity.  Which types of information events lead market response, which types are responses to market activity?

The e-Information group brings together computer scientists and financial economists with the goal of reaching an understanding of the new media for information transmittal and the relationships between market and information. In particular, Internet-based stock message boards provide a real time window into the minds of some individual investors. Our research team, among others around the country, is seeking to understand how this new technologically-enabled form of communication works.  Our research seeks to understand the salience of what we call "e-information": investor sentiment and disagreement, extracted from the Web using statistical language processing.

Below you will find some of our preliminary work, as well links to related commercial and academic work. We are currently working on the construction of a large dataset to test a variety of intriguing propositions about e-information and the stock market. If you find other related links, let us know and we can post them for other researchers to share.

Sanjiv Das, Santa Clara University
David J. Leinweber, California Institute of Technology
Francisco de Asis Martinez-Jerez, Harvard Business School
Peter Tufano, Harvard Business School

Our Research:

  • "e-information". Sanjiv Das, Asis Martinez-Jerez, and Peter Tufano.  In this clinical study, we create a new micro-database of much of the information flow about four stocks for a period of six months, including a novel database of e-Information. We define e-Information as estimates of the intensity and dispersion of investor sentiment extracted from four major stock chat message boards. We combine this e-Information with other components of the traditional information set (formal press releases by the firms, filings, analyst revisions, and news stories available electronically) to create an intensive database of as close to the full information flow as is practical. We document certain patterns in the release of information and show that there are suggestions of links between our measure of e-Information and contemporaneous stock returns and implied volatility. While e-Information does not seem to predict subsequent stock price returns, it may be related to the implied volatilities taken from options prices. (Revision dated November 2001)
  • "Yahoo! for Amazon: Opinion Extraction From Small Talk on the Web" Sanjiv Das and Mike Chen. This paper describes the algorythm used to extract sentiment from the web. In addition to the paper, additional resources are given below.
  • "Three hundred years of stock market manipulation," David Leinweber and Ananth Madhaven, The Journal of Investing 10 (7/Summer 2001), 7-16. This paper shows how stock chat boards can be used to manipulate stock prices.

Commercial sites seeking to understand how e-information works:

Blogs dealing with eInformation:

Selected related research: