Trade and market data repositories contain many different types of formats including the space cruncher XML based FpML format. FpML is the recognized standard for communicating with Clearing Houses and counterparties. Hadoop as a distributed filesystem is obviously able to store all these documents. An even better option is to use HBase that enables very important properties:
- Structured, interactive and indexed
Structured, interactive and indexed
In the end, we would like the trade repository containing:
- Each individual trade in different format: original from the internal system (proprietary, Murex, Misys, Calypso, ...), FpML and a pivot format if several internal systems are involved
- All raw messages exchanged with clearing houses, counterparties: FpML, FIX, ...
- Market data from usual vendors
- Reports: consolidated risk, VAR, CVA, FpML, ...
We would like to benefit from the best values of the two worlds: traditional database for “structured data” and a file system for “unstructured data”. HBase is able to structure data, a bit differently from what a traditional database would do, which enable structured information, and even hierarchical organization of data.
A typical record in HBase. A row can be identified by its rowkey
and associated in a very flexible way to a bunch of values
This property allows building any kind of structures and data models. The Key-Value data organization enables the atomic CRUD operations in a structured way, which is the prerequisite for interactive system. CRUD stands for create, update delete.
Even if HBase does not implement secondary indexes, each row can be efficiently retrieved thanks to its row key. It provides HBase with some kind of indexation capabilities. In the next section, we will see how we have complemented it with an efficient indexing and searching feature.
Versions and timestamps
This interesting feature of HBase is very useful to implement a full audit trail of modification for every data. It is also possible to retrieve the data state at a given date.
A modified value. In bold is the active (latest) value,
but older values can be retrieved, all modifications are kept and time-stamped
HBase guarantees that any data acknowledge is actually stored and kept. It is not ACID (see http://hbase.apache.org/acid-semantics.html) but is atomic and provides a sufficient level of consistency for a trade repository (we have complemented this in HBase).
Hadoop based: reliable, HA, extensible and cost- effective
HBase is built over Hadoop and benefits from all the nice properties listed in the previous chapter.
Read more by downloading our white paper. We explain how Using Big Data low layers as foundations of your local Trade and Market Data repository, and why using Hadoop and HBase is the most efficient and cost effective option.