Good Data & Cloud Computing
Link: http://www.gooddata.com
A new startup funded in part by Marc Andreeson (formely of Netscape & Mosaic fame) called Good Data extolls the virtues of collaborative Business Intelligence and placing the platform on cloud computing environment for scaleability as usage grows. It leverages SOA and Web 2.0 technology to deliver a hosted solution that is as functional, if not more so, than pre-existing BI platforms like OBIEE, Business Objects, Cognos, etc. And best of all, it's free to try! 
I have yet to play around with this but I'll be taking a CSV file of raw data and trying to build reports on top of it, including the testing of Ad Hoc analytics on it. With Good Data, you can source multiple data sets, including Amazon S3 buckets. It's claim to access different data sets via their API needs to be tested out but the limitations are obvious if a database was the source of data that needs to be transferred over the Internet (securely of course) into Good Data's distributed system. I'm guessing Hadoop is their storage and processing model for distributing the share of workload for storage, retrieval and light processing of the data.
It will be interesting to see how this new upstart plans to break a market that has been previously dominated by a handful of BI players. And given the hosted model here, how long will we see this being produced as a standalone enterprise application that can be purchased by customers looking to manage their BI internally? Financial institutions would NOT want a hosted solution for this and they are a market that would be good to monetize on. What plans Good Data has is not clear but hopefully they sort this out and pick their target markets wisely.
Oracle Business Intelligence Enterprise Edition & Microstrategy Initial Comparisons
Having had a little more exposure to the Microstrategy application stack, specifically Version 8, I find they are fairly similar platforms. The differences end up being how to create Microstrategy (MSTR) Projects, which are essentially Logical Models within the Business layer in OBIEE. Metrics & Facts take on slightly different meanings within the MSTR world as they are fairly flexible in nature. The organization of things in multiple folders within an MSTR Project is a little daunting and confusing at first but makes some sense if you just focus on everything relative to the Project itself. If you extend the concept of MSTR Projects to a single OBIEE Subject Area, then you can make the association.
The MSTR analytical engine is fairly similar in that it will try to determine the most optimal query to construct based on the definition of the MSTR Project objects. The architecture of the physical data model requires a little bit of a different approach.
One thing I started to investigate was the fact that Microstrategy likes to snowflake tables off dimensions. This is a little different than the nearly complete normalization of dimension tables within the Oracle Business Analytics Warehouse data model. I'm going under limited exposure on the reasons behind this but I believe there are cases where snowflaking, let's say, a type table off a dimension may lead to some design approaches to the fact that that make it easier later on to create aggregates that join to the snowflake table at a higher level of granularity than referencing the dimension table. The MSTR analytical engine chooses the approach table to join to the proper granular fact table through its own implementation of aggregate awareness.
OBIEE has had aggregate awareness for quite some time now (at least 5 years) within its main server engine. It also was a little more intuitive to define at which level of granularity the Logical Table Sources were defined for and referencing the appropriate hierarchy column within a dimension to determine which LTS to query against. If an appropriate column within the normalized dimension table exists to join to the aggregated fact table, this design decision allows OBIEE to implement aggregate awareness in a much more efficient manner.
I'll dig more into how the Microstrategy analytical engine handles this with the configuration in the Project.
Another difference is that the metadata resides within its own schema whereas the OBIEE metadata resides in a single file. There are pros and cons in both methods. With an older BI platform product I worked on over 10 years ago called Information Advantage DecisionSuite (RIP IA!), it was very similar to what Microstrategy's approach is in terms of centralizing the metadata storage in a database schema. Migrations with metadata stored in a file vs a database means that it is a little more cumbersome to conduct within Microstrategy. In OBIEE, it requires copying a single file into the target server. Within Microstrategy, the Object Manager needs to push the changes from one environment to another, meaning that it needs to do database reads and writes from one metadata schema to another, in another database as well (if you architect it according to best practices and separate the target server metadata in another schema or database instance altogether). This, however, ensures that proper measures and checks & balances are done via the platform to preserve the integrity of the metadata components.
I'll post some more thoughts as I come across them. Feel free to poke holes in my observations though and comment on the comparisons. More to come! ![]()
New Job, New BI Platform, Some Interesting Comparisons to Come
I recently took a new position at a company as a Full Time Employee (FTE as we're called now
), ditching the consultant life of going from one client to another, and trying to take what I've learned and experienced over the years with Business Intelligence and OBIEE and applying them to this company.
The platform they are using is Microstrategy, which as some of you (or most of you astute readers) will know to be one of the oldest ROLAP Business Intelligence tools in the market. It's on version 9 now so I'll have quite a bit of exposure to the Intelligence Server architecture, the platform itself, the Reporting engine and the Administration portion of it to compare with OBIEE's similar parallel offerings.
I've already noticed a few differences between the 2 platforms but am surprised at the amount of similarites. In the next several posts, I'll probably do some comparisons of the 2 platforms and invite comments about the pros and cons of each one of these comparisons.
All things being equal, OBIEE and Microstrategy aren't that far apart in terms of what they actually offer and provide businesses: the ability to visualize their data and gain value and insight into their organization in order to manage their business more effectively and efficiently. It's just in HOW they go about taking care of business that will be interesting to compare with.
Oh, and BTW, why aren't there more resources on Microstrategy (such as blogs and forums) than there are available for OBIEE?
To Cache or Not to Cache, that is the question? (Pt 2)
So where does the OBIEE Server Cache give you the best bang for the buck? Do you have a day to listen to a number of differing opinions on this? Nothing is quite as controversial as determining when or when not to apply report caching at the OBIEE Server level.
Hopefully you have set up most, if not all of your dimensions on the RPD to be cacheable. This is probably the very basic thing you would want to do from the initial design of your RPD. These objects are the least likely to change frequently (and I'm generalizing because it really depends on your Data Warehouse implementation). Most likely, the facts may also be on the same refresh frequency as your dimensions so it'd also make sense to enable caching on them as well. (Note: I would not recommend doing this if you have a very rapidly (like every hour refresh) changing incremental update process within your DW.)
If you follow John Minkjan's scripts (or any other custom script) to manage and purge the cache appropriately, you'd be effectively and efficiently helping performance on a number of areas like:
- Dashboard Prompts on dimensional attributes
- Answers queries on dimensional attribute list of values for filtering on
- Commonly referenced base measures from fact tables
- Basic common dashboard reports
Once you get beyond these very basic areas, caching may or may NOT help at all. My colleague has a good write up on why certain things may not be cached in OBIEE here.
Most implementations of OBIEE center around a tight security model. With caching enabled, most dashboard reports are run through the OBIEE security model that has been designed in place. As of now, the OBIEE Server caches the results for a specific user ID but does not share the results of this cache to other user IDs. To do so would be a nice little breach in security, wouldn't it? In general, though, the best practice to get around this is to execute the Dashboard report that is commonly viewed by all users under a General user (most likely an RPD user) and cache the results for all other users. This is all well and good, except when you have data-level security on the dimension/fact tables implemented within the RPD that cause each of these common reports to generate a query unique to the user. Then, the caching works for each individual user but not for the rest of the OBIEE users.
The frustrating thing is that no amount of planning can allow developers to prepare for what queries get cached, for which user and what set of dashboard prompts that filter each report. For instance, allowing individual users to "Save Current Selections" on a Dashboard page would simply prevent any predictability on what to cache. Planning which users are running what reports and seeding the cache for these users is a maintenance nightmare, especially if your user base for OBIEE is quite large.
What do we do about this? I'm going to let you follow the path to Mark Rittman's excellent suggestions here.
Any OBIEE and Data Warehousing professional worth their weight in salt should always take the approach to push as much of the work down to the database level as possible, designing the most optimal data model possible and creating the necessary aggregations and performance tuning at the database FIRST before tackling performance optimizations (or tweaks!) at the OBIEE Server/Presentation level. To leverage all your knowledge into using the OBIEE Server cache is not really optimization at all but merely using it as a crutch to mask the base issues at large.
To Cache or Not to Cache, that is the question? (Pt 1)
Within the OBIEE server, there is a feature on the server for caching report results so that other users can retrieve the results of the reports fairly quickly, without putting unnecessary load on the database server for the same sets of queries. This has, IMHO, caused some consultants and implementors to use it more than it was intended to be, a band-aid to what is the core of OBIEE, the relational database underlying it all.
Remember that OBIEE is NOT a MOLAP product with the ability to pre-generate multidimensional cubes into a highly optimized format for very quick retrieval of data. It is also not a static reporting tool, which is how I view the cache sometimes since the cached results are static until the cache is purged by some predetermined scenario(s). It sits on top of a relational database, where your Data Warehouse resides, and is meant for dynamic retrieval of this dataset.
Arguments about clients & customers having a fixed data model that cannot be altered which forces us to resort to using the OBIEE server cache to be seeded with report results are just excuses I believe.
I'm going to step on my soapbox here for a second. The first thing to do is level-set the expectations and try to guage whether their investment in OBIEE (and it is not a cheap investment!) is with the right intentions from a high-level perspective? Are we JUST providing reporting? OBIEE is not the tool for this. Analysis of the data, diving into it to get at root causes of exceptions and being able to take actions, reactive or preventative, as a result of what the data is telling you is the key to OBIEE's purpose in life. It's not a pig on a stick or a lipstick on a pig or whatever the new political allegory is nowadays. It serves a particular purpose within the enterprise and needs to be communicated that way from the get-go.
With that being said, caching does serve the purpose of helping to relieve frequent trips to the database layer by allowing frequently used reports to be served up to a broad base of users fairly quickly. It is not for speeding up the results of the OBIEE application so that it pretends to be quick and nimble on its feet. Let's not delude the users this way since they ARE smarter than some of us make be led to believe!
Performance tuning on the database level has ALWAYS been the key to ensuring that OBIEE performs at its optimum levels. This means having a solid and optimal data model in place for your Data Warehouse, given the requirements of what's being reported on, data volume growth patterns, and the metrics and attributes required for meeting reporting needs. It means having a solid DBA team to provide technological strategies such as DB configuration, both hardware and software, indexing strategies and partitioning strategies. It means having a solid OBIEE application administration team to constantly monitor the evolving nature of a Business Intelligence application, analyze the current usage and plan for future usage growth, as well as reporting scenarios the business is likely to need moving forward.
After all this is done, go ahead and start doing some caching strategies. That is why it is on the OBIEE application level and not on the backend layer.
There has been mention of security requirements coming into play that affect the requirements for OBIEE server caching. These also need to be handled carefully and properly in order to adhere to the security standards of your enterprise.
These are my personal thoughts on the matter but there may be other scenarios that I have not encountered which has valid merit in the extensive use of OBIEE server caching. I just have not encountered those scenarios yet.
Next time: Does the caching mechanism work? Aren't there limitations on how much can be cached? Which users are encapsulated within the scope of caching and which aren't?
04/24/09 12:11:58 am, 