It seems that a few months after I started my IT Journeyman blog that I have an online doppelgänger. That's OK because I'm not the jealous type.
Having skimmed said blog the following post http://www.itjourneyman.com/2010/01/16/data-warehouse-2nd-time-is-a-charm caught my interest and it's essentially a rehash of a few white papers on "pitfalls/mistakes to avoid when building data warehouses". The long and the short of the post is that your first data warehouse will be a failure but don't worry because the second one will learn from those lessons and succeed.
I love to say that this was true but imho its just not that simple. In my travels I've worked on first stab data warehouses that have been blinding successes and also third tries that have had no more luck than their predecessors.
There are lots of elements that go into making a data warehouse project succeed or fail and often the initial expectation setting exercise is crucial. We have to be very careful in determining the criteria of what makes a data warehouse work and what doesn't.
It's a bit like marriage and divorce. Most people would assume that definition a fifty year marriage must have succeeded - but what if the husband and wife were at each others throats for the duration. Likewise divorce after 10 years is seen as failure but what if you've produced a couple of wonderful and well adjusted kids and went your own way amicably. Expectation is everything.
What I can say is that in my experience Data Warehouse projects are difficult and that's why I choose to work in that field and not implemeting somebody elses off the shelf package.
Data Warehouse Projects are voyages of discovery and it's what we learn along the way and not necesarily where we end up that's really important. The problem is that most organisations and most PM's just don't understand that yet.
Tuesday, February 2, 2010
7x24
If you've been around in IT for a while you've probably come across the term 7x24 meaning 100% system uptime.
I was once employed in London by a Investment Bank as a DBA where we were developing a mission critical global options trading system. Luckily the data volumes were small, the servers and environments stable and I'd had plenty of time to work through a reliable hot standby failover solution with an excellent UNIX Sysadm. All was good in my world.
Then during the preparations for go-live the topic of Availability arose. The Project Manager threw into the mix that we had to guarantee 7x24 availability.
My response was that we could aim for 100% uptime excluding planned outages but that we couldn't guarantee it. This resulted in a bit of table-thumping, as was quite often in IT projects in an Investment Bank.
Suffice it to say that when I explained the costs and complexities involved in guaranteeing high that availability from a solutions side and the human side it the PM became a bit more reasonable, especially when I threw in the fact that neither Scott McNealy nor Larry Ellison could guarantee 100% uptime on the configuration of Solaris and Oracle that the solution was constructed.
So the lesson is that before you start discussing High Availability the metric that needs to be understood is the actual cost, either in dollars or reputation, to your business of the mission critical app being unavailable. Until you have that there's really no point in discussing the HA requirements of the system. The funny thing is that when I was consulting I designed lots of Technical Architectures and never once could I get that fact out of the client.
I was once employed in London by a Investment Bank as a DBA where we were developing a mission critical global options trading system. Luckily the data volumes were small, the servers and environments stable and I'd had plenty of time to work through a reliable hot standby failover solution with an excellent UNIX Sysadm. All was good in my world.
Then during the preparations for go-live the topic of Availability arose. The Project Manager threw into the mix that we had to guarantee 7x24 availability.
My response was that we could aim for 100% uptime excluding planned outages but that we couldn't guarantee it. This resulted in a bit of table-thumping, as was quite often in IT projects in an Investment Bank.
Suffice it to say that when I explained the costs and complexities involved in guaranteeing high that availability from a solutions side and the human side it the PM became a bit more reasonable, especially when I threw in the fact that neither Scott McNealy nor Larry Ellison could guarantee 100% uptime on the configuration of Solaris and Oracle that the solution was constructed.
So the lesson is that before you start discussing High Availability the metric that needs to be understood is the actual cost, either in dollars or reputation, to your business of the mission critical app being unavailable. Until you have that there's really no point in discussing the HA requirements of the system. The funny thing is that when I was consulting I designed lots of Technical Architectures and never once could I get that fact out of the client.
Kindle Surprise
Today I saw my second Kindle on my commute home. Unlike my first encounter I didn't feel that it was a LOL moment, but neither did I come away with any sense of envy regarding the device. I probably categorise it as an interesting piece of technology but one that I will pass on.
Monday, February 1, 2010
Assisting the Police with their inquiries
Back in 1998 I was doing some Pre-Sales Consulting for an Account Manager trying to sell a Data Warehouse solution to a local state police force. I badgered the salesman to let me use the above title as a tagline on the demo but unsurprisingly he didn't see the funny side.
During the demo the thorny question of Metadata came up. More precisely - Consolidated Metedata. As I'd just come off a project where I'd defined the Metadata Architecture and Solution I was well qualified to answer the query.
At the time we had three sources of metadata for our solution. These were:
- The Database Data Dictionary
- The CASE/Data Modeling tool in use
- The ROLAP Semantic Layer
Note that this we didn't use an ETL product that would have been a fourth source of Metadata.
Now the interesting thing here is that all the software was written by the same company in the same software labs so one would hope that some level of shared metadata would be possible. Alas no. Not only did the metadata in each repository overlap but there was no easy way of combining it into a single source of consolidated metadata repository.
I answered the question honestly that nobody had a good story here, not us nor our competition. I think the client appreciated my honesty here. The account manager obviously not wanting to leave a bad impression did what all account managers are prone to do and started promising vaporware with some cock and bull story about the software labs in California working on that problem.
The interesting thing is that here we are over a decade later and I've still to see a good answer to this problem.
During the demo the thorny question of Metadata came up. More precisely - Consolidated Metedata. As I'd just come off a project where I'd defined the Metadata Architecture and Solution I was well qualified to answer the query.
At the time we had three sources of metadata for our solution. These were:
- The Database Data Dictionary
- The CASE/Data Modeling tool in use
- The ROLAP Semantic Layer
Note that this we didn't use an ETL product that would have been a fourth source of Metadata.
Now the interesting thing here is that all the software was written by the same company in the same software labs so one would hope that some level of shared metadata would be possible. Alas no. Not only did the metadata in each repository overlap but there was no easy way of combining it into a single source of consolidated metadata repository.
I answered the question honestly that nobody had a good story here, not us nor our competition. I think the client appreciated my honesty here. The account manager obviously not wanting to leave a bad impression did what all account managers are prone to do and started promising vaporware with some cock and bull story about the software labs in California working on that problem.
The interesting thing is that here we are over a decade later and I've still to see a good answer to this problem.
Taking the Mountain to Mohammed
I've been working in the field of Data Warehousing for some 13 years now. Actually my first every data warehouse was a Reporting System I did back in 1992 long before I'd ever heard the terms DW & BI but that's another story.
The interesting thing that, so far, has been a constant in all that time, no matter what style of Data Warehouse (from full blown Inmon Corporate Information Factory to Kimball Federated Data Marts), is that we extract data from source systems and move it and load it into a data warehouse (be it an EDW, Data Mart, ODS, RDS, whatever). We'll use terminology like ETL, OLAP, ROLAP, Cubes, Star Schemas, Metadata, Slowly Changing Dimensions, etc. along the way to baffle the business and make ourselves seem clever but fundamentally any data warehouse or data mart involves moving data from a source system into target reporting system.
Back in the 90's this made perfect sense because it was inconceivable that we could slap resource consuming queries on reports against the mission critical core business systems.
Nowadays that just not the case. There are many technical solutions out there that could enable us to place a large and significant batch query and reporting load against our production data that would have zero impact on the core business systems. Technologies that spring to mind include Server Virtualisation, Disk Replication and Mirroring, O/S and Database Parallel Server technologies, etc.
The question is why don't we employ these technologies? I suspect that in the field of DW & BI we're in a stuck in a Kimball or Inmon rut and that for the time being we will continue to Take the Mountain to Mohammed.
Ah, but what about history I hear you ask? Well yes it's true that we often capture history in the data warehouse that we cannot keep in our online systems but often the need and justification for history is overstated. Besides another way in which we could keep all the history we'd ever need (and we probably already do this to some degree anyway) is to ensure that all PDF reports that are produced are kept online in some fashion. There are alternatives if we are creative.
Maybe within the decade well see a shift away from this and let Mohammed walk to the mountain for a change.
The interesting thing that, so far, has been a constant in all that time, no matter what style of Data Warehouse (from full blown Inmon Corporate Information Factory to Kimball Federated Data Marts), is that we extract data from source systems and move it and load it into a data warehouse (be it an EDW, Data Mart, ODS, RDS, whatever). We'll use terminology like ETL, OLAP, ROLAP, Cubes, Star Schemas, Metadata, Slowly Changing Dimensions, etc. along the way to baffle the business and make ourselves seem clever but fundamentally any data warehouse or data mart involves moving data from a source system into target reporting system.
Back in the 90's this made perfect sense because it was inconceivable that we could slap resource consuming queries on reports against the mission critical core business systems.
Nowadays that just not the case. There are many technical solutions out there that could enable us to place a large and significant batch query and reporting load against our production data that would have zero impact on the core business systems. Technologies that spring to mind include Server Virtualisation, Disk Replication and Mirroring, O/S and Database Parallel Server technologies, etc.
The question is why don't we employ these technologies? I suspect that in the field of DW & BI we're in a stuck in a Kimball or Inmon rut and that for the time being we will continue to Take the Mountain to Mohammed.
Ah, but what about history I hear you ask? Well yes it's true that we often capture history in the data warehouse that we cannot keep in our online systems but often the need and justification for history is overstated. Besides another way in which we could keep all the history we'd ever need (and we probably already do this to some degree anyway) is to ensure that all PDF reports that are produced are kept online in some fashion. There are alternatives if we are creative.
Maybe within the decade well see a shift away from this and let Mohammed walk to the mountain for a change.
Where is the Henry Ford of IT?
I can only imagine that Henry Ford was an amazing man. His invention of the production line is probably the greatest commercial achievement of the 20th Century. Here's my question for today - Does IT need its own Henry Ford?
If we compare the production line analagy to IT development we will traditionally end up with something based upon the SDLC (System Development Lifecycle). That's fine as far as it goes but as always the devil is in the detail. In all my 20 years industry experience I've never been able to use the same development process unchanged between sites. Think about that for a moment. Every IT Development Team has had different processes, standards, tollgates, etc. but we're all effectively doing the same thing. I should be able to move from one job to another and technology aside be instantly productive. However, as a new developer to a site we spend longer dancing our way through the process minefield than we ever do in writing code. Something just isn't right there.
If we compare the production line analagy to IT development we will traditionally end up with something based upon the SDLC (System Development Lifecycle). That's fine as far as it goes but as always the devil is in the detail. In all my 20 years industry experience I've never been able to use the same development process unchanged between sites. Think about that for a moment. Every IT Development Team has had different processes, standards, tollgates, etc. but we're all effectively doing the same thing. I should be able to move from one job to another and technology aside be instantly productive. However, as a new developer to a site we spend longer dancing our way through the process minefield than we ever do in writing code. Something just isn't right there.
Rancid Aluminium2: 26 billion smackers down the gurgler
In a recent report it was stated that the UK Govt under NuLabor had wasted about GBP26bn on failed IT initiatives. That's about $2bn for every year in office with an ROI of zero. Just think about that for a moment?
Maybe some of that money was wasted on building 1,700 websites of which only 431 will remain by the end of 2010 after recommendations that most be culled in a recent audit.
Meanwhile James Cameron spends about GBP180mn making 'Avatar' over 4 years, whilst pioneering new technology and gets an ROI of over GBP1bn in less than three months.
Honestly, the UK would have been wiser investing this money in Cameron's Lightstorm and Peter Jackson's WETA and conservatively could have made a profit of GBP100bn which is half the money that the BofE has printed with its policy of Quantative Easing to bail out the banks.
OK it doesn't work like that and we know that when UK Goverment money finds it's way into the arts (via Lottery funding) we end up with films like 'Rancid Aluminium' and not Cameron's smash hit.
What puzzles me is that the government still tries to run large IT projects anymore because everyone know that it's just a licence for government approved suppliers to print money.
I don't know how much the US has spent on intelligence related IT projects post 911 but what I do know is that they failed to stop a known terrorist suspect from boarding a flight on Christmas Day.
So what's my point?
Over the last decade I've had the pleasure to work with two managers who successfully defined how they would structure and govern large projects in order to avoid the wastage so profligate in government IT spend.
The first even wrote a thesis about how large projects are inherently more difficult and risky to land than smaller ones. The second built an IT governance framework that consisted of a few simple groundrules:
- all projects to be sponsored by the business without exception
- no project to last more than 9 months. Any piece of work identified larger than this would be broken into phases less than 9 months in duration.
- no project to cost more than GBP2m
Sounds simple and yes it works, but what about when you need to do the big projects? Well I guess we probably need Project Managers of the calibre of James Cameron for that otherwise you're better off saving your pennies for a bailing out a bank or two.
Maybe some of that money was wasted on building 1,700 websites of which only 431 will remain by the end of 2010 after recommendations that most be culled in a recent audit.
Meanwhile James Cameron spends about GBP180mn making 'Avatar' over 4 years, whilst pioneering new technology and gets an ROI of over GBP1bn in less than three months.
Honestly, the UK would have been wiser investing this money in Cameron's Lightstorm and Peter Jackson's WETA and conservatively could have made a profit of GBP100bn which is half the money that the BofE has printed with its policy of Quantative Easing to bail out the banks.
OK it doesn't work like that and we know that when UK Goverment money finds it's way into the arts (via Lottery funding) we end up with films like 'Rancid Aluminium' and not Cameron's smash hit.
What puzzles me is that the government still tries to run large IT projects anymore because everyone know that it's just a licence for government approved suppliers to print money.
I don't know how much the US has spent on intelligence related IT projects post 911 but what I do know is that they failed to stop a known terrorist suspect from boarding a flight on Christmas Day.
So what's my point?
Over the last decade I've had the pleasure to work with two managers who successfully defined how they would structure and govern large projects in order to avoid the wastage so profligate in government IT spend.
The first even wrote a thesis about how large projects are inherently more difficult and risky to land than smaller ones. The second built an IT governance framework that consisted of a few simple groundrules:
- all projects to be sponsored by the business without exception
- no project to last more than 9 months. Any piece of work identified larger than this would be broken into phases less than 9 months in duration.
- no project to cost more than GBP2m
Sounds simple and yes it works, but what about when you need to do the big projects? Well I guess we probably need Project Managers of the calibre of James Cameron for that otherwise you're better off saving your pennies for a bailing out a bank or two.
Subscribe to:
Comments (Atom)