Wednesday, October 29, 2008

SOA, EDA and CEP

There has been some coverage in the blogosphere on how SOA, EDA and CEP relate to each others.
"So CEP is not EDA, EDA is more than CEP. Promoting CEP as being EDA is far too simple. And yet that is what is happening in the current IT space."
"Especially the vendors of event processors focus too much on CEP as being EDA".
Mark Palmer from StreamBase answered some
"CEP, in fact, is a really important element of an effective EDA. It's not a required element, but it sure makes it better"
That's how I presented it at JavaOne back summer 2007, and this still happens to be true - no matter how vendors are mashing up various concepts about it to try to differentiate one from each other (CEP and Grid, CEP and Web2.0, CEP and DDS, CEP and edge processing, CEP and SOA, CEP and BRMS, CEP and XTP etc.).
Here are two excerpts from the presentation I delivered with Thomas Bernhardt from Esper/EsperTech fame:




If you talk to folks using CEP (build or buy - does not matter) in some industries such as investment banking, they'll tell you this has nothing to do with SOA but has to do with XTP.

It's time to learn from past history.

A similar debates occurred not that long time ago around the SOA and ESB terms (2005 article on ComputerWorld). For the record, back in 2001 ESB and SOA did not even existed. Yet, some visionaries such as Sonic folks introduced ESB as a COTS product, despite established or maturing MOMs presence such as TibcoRV, MQ, or JMS-based/J2EE MOMs and a whole set of in house built solutions.

With no surprise, in 2008, I'd be curious to hear someone that would disagree with the statement below, which is in fact a copy/paste/replace from comments happening right now in the blogosphere rel. CEP and EDA as stated at the beginning of this post:
An ESB is not SOA, SOA is more than ESB. Promoting ESB as being SOA is far too simple. And yet that is what happened in the early days (back in 2001).
Got it now? Here is mine bold claim:
A CEP engine is to EDA what an ESB is to SOA.
(And it's also more than just that)

Esper - course on DSMS at University of Oslo

The University of Oslo, Norway, is running for 2008 some courses on Data Stream Management Systems (DSMS) - some of it based on Esper.
DSMS was the term coined by academic work as opposition to DBMS (database) in the "early days" before the more marketed Complex Event Processing CEP acronym was introduced. CEP is now more widely used by vendors, press and analysts as productization and real world deployment is taking place.

Some course materials are available online, and it is worth the read because it recaps on the key concepts, fit for purpose, and mechanics of DSMS/CEP engines such as TelegraphCQ, Stream, Borealis and Esper.

There are also 2 project assignments that plan to leverage Esper - which is an excellent illustration of how open source can be leveraged in the academic world.
  • Extending a DSMS benchmark
  • Online analysis of medical sensor data with the Esper Event Stream Processing System
There is obviously great confidence that the Esper project will not be closed down anytime soon for monetization purpose - unlike what happened with other such DSMS projects (if not all...). The direction around Esper and enterprise open source is already crystal clear and that's definitely a win-win for both academics, community, and enterprise practitioners.

Thursday, October 16, 2008

Complex Event Processing and Data Fusion - Esper demo

I have put together a live demo of a CEP track and trace application based on Esper Complex Event Processing (CEP).

The entire scenario has been designed and developed by Data Fusion Research Center AG, a Switzerland based company "that is quickly becoming a leading center of knowledge, research, and development in the field of geospatial data fusion and analysis". DFRC has chosen Esper for their CEP solution, and has kindly made preliminary results of their work available. I have rewrapped their application bits into a single Java WebStart package so that you can run it securely in a sandboxed environment without installing anything.

The Scenario

The scenario illustrates:
  • Edge computing, with raw events coming from radio sensors or ground based radars at the boundaries of a classical IT infrastructure
  • Complex Event Processing, an automated way for deriving coarse events out of real-time event streams, using advanced concepts such as time driven computations and causality concepts. A CEP solution typically comes with an abstracted programming model or event processing language so as to empowers the application developers with a continuous query paradigm.
  • Data Fusion, a "set of techniques that combine data from multiple sources and gather that information in order to achieve inferences, which will be more efficient and potentially more accurate than if they were achieved by means of a single source" (wikipedia). Data Fusion can sometime appear as a conceptual superset of CEP. In the DFRC application, the CEP algorithm can be tuned in several ways - which directly maps to the "potentially more accurate" goal of the Data Fusion approach.
  • Rich client application, to empower the business user and represent CEP / Data Fusion derived information in the most efficient way (in this case a satellite map with real time moving icons for identified aircrafts). CEP + BAM or CEP + BI - it all comes down to materializing real-time information out of the CEP / Data Fusion engine.
The Problem

Assume ground radars are pushing position events from disparate non-identified sources flying all around (friends or foo, UFO, noise, doesn't matter):
PointEvent {
latitude
longitude
}
For the example app, the events are simulated from a raw flat file:
46.5 7.2
47 7.2
46 7.2
46.5 7.2
47 7.2
46.3 7.2
...
The challenge is to identify all the flight paths in real time and eliminate noise, so as to figure out where are the aircrafts, what are their flight path, and determine if further investigation has to be performed by humans or downstream systems.
It is all about turning real-time raw event streams into situational awareness.
A flight path is a directed sequence of position events that is extremely likely to represent a real aircraft trajectory. It will be displayed on an interactive map, and specific tresholds of the data fusion detection algorithm can be tuned in the client side application.
Relying on CEP and Data Fusion concepts ensures we can scale to a large number of aircrafts, a high troughput of position events, and truely empowers the business users, turning a raw stream of latitute/longitude tuples into a rich system.



The Solution

The solution is implemented using Esper. Esper combines
  1. A full featured EPL - event processing language. It can be for simplicity considered as an SQL-look-alike language augmented with time and causality. Main point here: this is a continuous query paradigm, and not a repeatedly executed query, and there is no database.
  2. An efficient, feature rich CEP engine, implemented in Java (also available in .Net/C#). Refer to the docs, presentations and website for more details and usage scenarios.
  3. An open middleware platform with open APIs, leveraging existing standards, that can be integrated into an existing infrastructure.
The algorithm designed by DFRC can be summarized as below (from their case study). Key capabilities, such as reusing existing geodesic distance and azimuth delta computation libraries straight into the Esper EPL language are also key capabilities being leveraged.
"Basically the algorithm was written to correlate events that are close enough together in distance and direction during a specific time frame.
Those events are considered as a potential flight path. Once it correlates events, it builds a flight path between connected events.

The algorithm compares flight paths, if any two paths share start or end points, which would mean that we have a longer flight path containing 3 points.

It then checks any 3 point flight path measuring the azimuth difference from the 1st to 2nd and the 2nd to 3rd. If difference is less than a predetermined number of degrees we consider it an identified aircraft."

Demonstration and Conclusion

Run the live demo (Java WebStart)
Quick howto:
Accept the Java Web Start security dialog
The rich client application launches and display a satellite map
Click the Start Button
Blue dots on the map are the raw position events. Red lines are the flight paths identified out of the raw data. Green arrows are the identied aircrafts


Read more from DFRC AG case study
Read more about Esper

The demo is entirely databaseless and serverless and fits in just 3 MB binaries. The very same concepts implemented here with Esper can of course be pushed out to the real world, with entire control of the architecture, its scalability and integration with fully fledged sensors/server/client setup.
Congratulations to DFRC for putting it together.

Monday, October 13, 2008

Complex Event Processing with Esper making its way to Java community

There has been some nice coverage around Esper and Complex Event Processing (CEP) in the Java community recently:
  • Complex Event Processing with Esper, on DZone and published by OCI Inc.' Paul Jensen. A nice introductory article that covers the basics of Esper 2.x.
  • Open Source SOA upcoming book at Manning, due for availability in March 2009 by Jeff Davis - see interview on DZone. The book will cover CEP with Esper, alongside ESB, SCA, BRMS, and BPM all with open source solutions.
"Jeff: Complex Event Processing is a somewhat emerging technology, at least in the open source space. With CEP, real-time business events are sent to the engine, which can then use correlation and pattern matching rules to determine whether any anomalies are occurring within your enterprise. The real-time analytical engine is what differentiates CEP from other more traditional BI vendors, which tend to evaluate events after-the-fact. CEP is very exciting, and can be used for anything from compliance, monitoring service levels, to real-time trending."
It's great to see that after the core Esper team - including myself - spent time doing some evangelization for the last 2 years with articles all around on OReilly, The Server Side, InfoQ and Java One, it is now being taken to the next level thru autonomic contributions. That is one of the great outcome of making things available under an open source model.