DATA MANAGEMENT: it’s more about “WideData” than BigData
The systemic meltdown in 2008 and subsequent ripple effect of overexposures and bad debt have all led to the global revisions of regulations. Whether as part of major reforms, like the Dodd-Frank Act and European Market Infrastructure Regulation (EMIR), or as part of local implementations of regulatory changes, the key underlying theme in all of them is data management. In this article, Gavin Kaimowitz, Simon Lyon and Geoff Cole discuss the importance of the interconnectedness of data and how a focus on “WideData” will be needed to support future regulatory requirements.
Rather than thinking about the physical size of data, consider the importance of being able to relate sets of data. For financial services firms, the complexity doesn’t lie in the need to store more data; it lies in how best to link, arbitrate and cleanse all of the data required to support regulatory needs. So, it becomes less about how big the data is and more about its breadth and reach across the organization. The new challenge in the next few years will revolve around managing WideData.
Regulations Influencing the WideData Trend
In the next two years, regulations will be enforced across a variety of operational and technical areas. The following regulatory requirements highlight the need for investment into strategic data programs:
Data Retention, Search and Retrieval
The data retention business conduct rules as part of the Dodd-Frank Act require registered Swap Dealers (SDs) and Major Swap Participants (MSPs) to record, store and reconstitute all information related to a trade within 48 hours upon request from the Commodity Futures Trading Commission (CFTC).
It is expected that firms will be able to provide complete audit information of a trade, related telephone conversations, analyst reports, sales and marketing documentation, emails, instant messages and any other pertinent information that aided in the investment decision—including all pre-trade activity (quotes, bids, offers, solicitations) relating to the inception of a deal.
Impact: Firms need a mechanism to correlate disparate transaction records with records of communications—either through automated or manual processes.
OTC Swap & Derivative Counterparty Identification
Global trade repository reporting requirements mandate using standard counterparty identifiers in order to provide regulators with the ability to aggregate and identify potential risk areas. In the absence of a global legal entity identifier (LEI) as proposed by the G20 Financial Stability Board (FSB), the CFTC has required the use of interim identifiers. In other regulatory regimes, the lack of a global standard may provide greater tolerance and allow the use of existing standard identifiers, like SWIFT BIC codes, as the best available and easiest options for firms to achieve compliance.
Impact: Firms need to make “best efforts” to clean counterparty reference data to support immediate regulatory requirements while maintaining flexibility to transition to a global standard in the near future.
The Dodd-Frank Act will be followed by other global regulations that require the disclosure of unprecedented levels of detail for each and every derivative transaction. Public dissemination of trade execution data in near real-time will provide all market participants with indications of liquidity and levels of activity. Additionally, the reporting of full primary economic terms and valuations data will give regulators the ability to calculate risk and police markets with greater levels of scrutiny in attempts to mitigate systemic risk.
Impact: Firms must enhance trade capture from both a timeliness and completeness perspective and continually enrich data as it flows through to confirmation and settlement systems. This increases the reliance on data quality, especially as it pertains to counterparty information.
Perhaps the most critical data element of trade repository reporting requirements is the use of a Unique Swap Identifier (USI) or Unique Transaction Identifier (UTI). The transaction identifier must be shared with the counterparty at or near the point of execution and is of critical importance for regulators seeking to understand “the story” of a derivative transaction over time. Additional requirements necessitate the linking of transactions by USI in order to indicate how risk may have been transferred through novations and other post-trade life cycle events.
Impact: Booking processes and trade capture systems must evolve to support a range of new transaction identifiers with increased functionality to support transaction linkage and life cycle event traceability.
Foreign Account Tax Compliance Act (FATCA)
Enacted in 2010, the Foreign Account Tax Compliance Act (FATCA) was introduced to combat tax evasion by US persons through the use of offshore vehicles. Passed as part of the Hiring Incentives to Restore Employment (HIRE) Act in 2012, the final regulations were published at the end of January 2013. FATCA requires Foreign Financial Institutions (FFIs) to identify US accounts and report certain information about those accounts to the US Internal Revenue Service (IRS) annually. Non-compliance results in 30% withholding tax on US clients of FFIs. A number of Inter-Governmental Agreements (IGA) have been signed with the IRS including a reciprocal agreement with the UK. IGAs may provide a slightly less onerous approach than full IRS registration.
Impact: Client data needs to be correlated and cleansed to ensure that the holistic view of a client can be analyzed to determine if the client is subject to the FATCA ruling.
Achieving Control through Data Governance
In 2012, the major investment focus within data management was around Legal Entity data. This was primarily due to the regulations concerning the reporting of OTC derivative trades and FATCA. In both of these situations, investment firms are required to provide a single consolidated view of their customers and to disclose their counterparty. While this might seem like a very simple exercise, the complexity of creating a single customer view is both technically and politically challenging.
Within multinational banks that run retail and Investment Banking (IB) divisions, the physical technology infrastructure is commonly split by region and business line. Data is commonly duplicated and is often inconsistent. This is not necessarily an IT infrastructure problem, but is likely the result of the manner in which a business operates or has grown (organically or with acquisitions).
In order to break down these barriers, more firms are changing their organizational structures by introducing a Chief Data Officer (CDO). The role of the CDO is to ensure that data is handled consistently throughout the organization—and with standard policies and measures.
The Data Retention and Data Lineage regulations are causing widespread debates this year between IT organizations and the business as to how to implement solutions that balance the need to provide an immediate solution to regulatory requirements and to build a strategic solution that will sustain the business over the next ten years. The main issue seldom relates to technology, but rather to the ability to correlate data from disparate data sources and to intelligently manage the bi-temporal aspects of data. As a recent example, the Libor probe is forcing all firms who are being investigated to look through several years’ worth of information—resulting in a massive manual effort to correlate the findings.
All processes within financial institutions track different data points. These data points will all eventually relate to one another if all of the correlation data points are captured. However, it is not possible to know now what the required correlation data points will eventually be, as new data types are constantly being introduced.
So when firms begin to map out the ontology to represent WideData within financial services, the data facts can be seen to have the potential to map to a variety of different data types. This includes newer data types that aren’t typically linked, like the internet and unstructured data.
Next Generation Data
As the breadth of related data relationships increases and new types of datasets are made available, new correlations will be possible. The trend of new data in capital markets includes the recent mandated regulation to provide unique Legal Entity Identifiers (LEIs), Unique Transaction Identifiers (UTIs), Unique Product Identifiers (UPI)s and other similar identifiers. As new symbologies are implemented, more accurate correlations can be formed. These new correlation points yield a new dimension of possibilities.
For example, the European DataWarehouse provides securitized loan-level information with detailed data points down to the four-letter post code and the demographics of the borrower. As described in Figure 3, these two data points correlate the data found on the internet and financial products.
Structured and Unstructured
Banks are typically faced with the following two types of data that are required to be correlated and have data governance oversight throughout the record life cycle:
Data that is physically placed in a format that is easy to access through technology and where the relationships between the data are defined. Technologies that are used to manage structured data include relational databases, XML and other structured flat file formats.
Typically in the form of free text-based artifacts, such as documents (e.g., Investment Research, ISDA Master/CSA, Prospectuses, Legal Contracts), telephone conversations, email chains, logs, video, pictures and other similar items.
For each of these types of data and based on the desired usage, there are appropriate technologies that can aid in the correlation of data. Regardless of the technology chosen, the ability to make correlations between these two data types is crucially important to meet regulatory demands.
Historically, the management of data that is not stored in a traditional relational data store has been complicated. However, recent advances in technologies now allow for an array of products to provide scalable and reliable solutions. These products are typically implemented as an ontology, semantic and natural language processing (NLP) solution. This allows the data to remain close to the native format in which it is found and for these types of products to be used as an interpretation and rich data mining framework.
Competitive Advantage of WideData
The battle of Agincourt is renowned for the triumph of the 6,000 English over 36,000 French. With just one sixth the number of troops, the English won the battle because they had access to information that the French did not have. For example, the English knew that the ground was saturated with water and they would get stuck if they marched through it with their heavy armor. This historical event proved that it only takes one small piece of information to gain a significant competitive advantage.
In more recent times, Michael Burry proved that he was able to predict the systemic crash of the mortgage market and to pick the best Mortgage Backed Securities (MBS) to hedge against. While the information he was using was freely available, his competitors were not paying attention to the correlation.
For centuries, information has been leveraged to yield an advantage. This still holds true today. In today’s economy, investment organizations are trying to gain an edge by knowing more than their competitors. Quants, for example, have been gaining a competitive advantage by seeking alpha based on highly confidential methodologies.
WideData offers a new way for firms to establish a competitive advantage and the market has already begun to take notice. This can be seen in the algorithmic trading industry with sentiment feeds, data providers offering supply chain information and data utilities providing transparency into loan-level characteristics. These new offerings are enabling firms to begin performing new types of analyses which can put them well ahead of the competition.
Planning for Constant Change
Regulations will continue to change, analytic requirements will evolve and business activities will advance to handle more complex products. The only way firms can prepare for this constant change is to strategically design and plan for a consistent way to manage WideData, enforce standard data governance techniques and ensure that the technical implementation of the data fabric in the organization is built intelligently to scale both wide and deep. After all, intelligence is more about synapses than neurons. If two organisms have the same number of neurons, the one with more synapses will always be more intelligent.
is Sapient Global Markets’ Global Data Practice Lead across the capital and commodity sectors. Gavin is responsible for the collation and generation of best practices, thought leadership and strategy. He has a proven track record in the reference and market data domain of solution design, business case definition, strategic roadmap design and in building data management products.