Curious Mind Wanderings
Curious Mind Podcast :The Digital Agenda
The Digital Agenda: Data warehousing
0:00
Current time: 0:00 / Total time: -32:21
-32:21

The Digital Agenda: Data warehousing

Past Present & Future with Bill "Father of Datawarehousing" Inmon

Fireside Chat between Bill Inmon and Piyush Malik

Video

Transcript

<Piyush> 

The digital agenda is all about data analytics, cloud, AI/ML, and emergent technologies that are transforming not only companies, but also entire industries and even our lives.

Hi, my name is Piyush Malik. A curious mind who wears and juggles multiple hats. As an engineer, a management consultant practitioner, a builder, a thought leader, entrepreneur, and a C-suite executive.

I have been in the industry helping organizations realize and navigate the value of data analytics, applied ai, ml and emerging tech. I have created strategies and transformation programs that helps them compete with the digital native. Those in the data industry have known about elusive single version of fruit concept that data warehousing promised to deliver.

That was over three decades ago. Those were the days when aggregating data from a variety of data sources in organizations meant a multi-year project. In my early days of BI and datawarehousing, it was hard not to get influenced by Inman versus Kimball methodology and approach. If you were in the data industry then, you may recall there were the ideological battles between adopting star schemas and snowflake schema, and so on and so forth.

In today's episode, we will meet the father of data warehousing, Bill Inmon, and find out what he's up to these days and peek into his crystal ball,

First a little introduction about him: Yale graduate author and technology pioneer. Bill Inmon is the founder of ForestRim Technology, best known as the father of Database housing.

Bill Inmon has been the most prolific and well known author worldwide in the big data analytics, warehousing, and business intelligence arena. He worked for American Management Systems and Coopers and Lybrand before 1991 when he founded his company, Prism Solutions, which he took public. And in 1995, he founded Pine Cone Systems and then ForestRim Technology, in addition to authoring more than 60 books and 2000 articles.

Bill has been a monthly columnist for number of publications, including BI Network, EIM Institute and Data Management Review. Bill was named by Computer World in 2007 as. one of the top 10 IT people who mattered most in the last 40 years of the computer profession and has been recognized as a thought leader and for his lifetime of achievements by many organizations such as DAMA, Data Modeling Zone, and PMI. Bill's latest innovation is the development of textual etl,. So let's go and find out what he's up to these days.

<Piyush>  Hello Bill, Welcome to the Digital Agenda.

Who is Bill Inman, I wanna hear from the man himself !

<Bill> Well, thank you Piyush. I'm just a guy who lives in Denver, Colorado. I've got a wife and two dogs, and my dogs think of me as a person who takes the garbage out, and, and so I'm, I'm just a, just a guy.

Its is been my privilege to be a an observer and a participant in, in the maturing of the data warehouse and, and data profession. All these.

<Piyush>

Well, Bill, to me, you are not an observer. You are a institution in yourself. Having written 65 books and more than 2000 articles in the data, data warehouse field, I mean, there must be something that motivates you to wake up every day morning and, and either write or, or do something in the data industry or tell, tell us what's your motivation

<Bill> 

You know, Piyush, if I understood that myself, I would tell you- I don't understand it, but I, feel very motivated to to look at, at our profession from a standpoint of architecture. Now, there are not many people. There are a few, but there are not many people that have what I would call an architectural viewpoint.

And, and I think that. The architectural viewpoint quite frankly, is very, very valuable to to people in our industry. But the truth of the matter is, is our industry is a maturing industry. We, we are an immature industry. When I say immature, I don't mean that in the pejorative sense. I mean that in historical sense.

When you take a look at the IT profession and then you compare it to medicine or accounting or or, or any of the other professions. These are engineering. The walls of Rome were built by an engineer over 2000 years ago, and so the other professions are thousands of years old. So, it depends on where you start counting.

We're 60 or 70 years old, and so is our industry . People don't have this, understanding that, somehow they think things are said in concrete. Things are not said in concrete. And, so anyway, I am an architect. I'm probably the first data architect that there was, I admit to that!

But I look at things differently and, and further, Let me give you one quick example. A while, a while back, I was invited to speak at a conference and I speak at conferences all the time. And, and that's nothing, nothing new to me. And I went to this conference and, and it was a conference for IT management and, I was speaking with there was another speaker in another.

And I was speaking on getting business value out of data, which I believe, I believe to be a very important topic. In the other room, they were talking about some new device that had been created in Silicon Valley and,so all the managers were at this, the IT managers at this conference had a choice to make as to who they wanted to listen to.

I had one IT manager come listen to & the other 90 or so IT managers were in the other room listening to how some, some new device was being announced. And it turns out that device doesn't even exist anymore. But the IT managers of the world are professionally immature people. They don't like to be called that, but they are.

And, so, so anyway I,simply am a data architect. Trying to alert the world to the other ways of thinking, and that's what I do.

<Piyush> 

Wonderful. Well, I'm intrigued at the plight of those 99 folks who were unfortunate enough not to listen to you. But, you know, I have to recount my experience too here.

And as you know, the very first time I saw you - and that was in Phoenix, Arizona at the T D W I conference, which obviously you were chairing. And you were sitting on the big dias right next to Steve Balmer, then CEO of Microsoft. And the room was full of more than 2000 people in-person at that convention.

And that was my first experience for a dataware housing conference, and felt privileged enough to meet you. I met you a few more times again at conferences subsequently. And the one, good memory of my conferences with you was sitting on the dias almost 10 years later in the San Francisco DAMA conference and at the Genentech campus.

And we were talking of the same stuff - the value of data. At that time I was, leading the BI business analytics and optimization practice at I B M, and we obviously start from the business problem and how do you solve a business problem with data? And using the modern at that time the methods of ETL and using tools, ETL tools as opposed to coding by hand.

So that was a progress that they had made. And relating back to what you said the walls of Rome 2000 years ago, the engineering profession, very old, but the computer science profession, only about 60, 70 years. And within that, the data profession is relatively younger and it's still, I would say maturing.

<Piyush> 

It's not matured yet. It's evolving. And you've seen many epoch, so to say of this industry. So, so I'll just pivot to the very first book I believe you wrote. Was it in 1992?

<Bill>

Actually I think I wrote my first book in 19. 84

<Piyush> 

okay. Okay. Well, well, I got your first book in the late nineties and since then, you know, “Building the data warehouse”, “Corporate information factory” etc and now coming to the “lake house” now, which obviously talks about the different eras of evolution of the data warehousing industry that you've been very much part of.

So, talk to me how this evolution, whether it was in coming from your mind, or was it the market forces that you observed while working with clients that made you decide to write newer books especially about the data lakehouse.

<Bill>

Recently, well,

okay, I'll be happy to tell you the, the, the evolution of, of, of all.

A long time ago I was a a, a writer for a, a journal called Computer World. I haven't seen Computer World lately, but, but once upon a time there was a, a journal called Computer World, and I was one of their writers at that time. The, IT profession was building transaction processing. That you go into a bank, you go into a manufacturer, you go into an insurance company, and everybody was building transaction systems.

And, I learned, I think . I learned! I hope I learned a lot about transaction processing. But I also recognize that the data that at that time we were producing could be used for multiple things, but it was. Fault of the day by everybody. That transaction processing data was only for transaction processing.

And so as a writer in computer world, I started to write that, gee, we can do other things with the data other than transaction processing. Now, I've never understood why this was so, but this was very offensive to (some) people. You should see some of the letters that I got. And I have a collection of them, and occasionally I'll go back through 'em.

Mm-hmm. , I was told that I was sending our industry back 25 years. I was told, I was told that I should never be speaking in public again. I, I, I, I had letters from people with language that I can't repeat here and don't want to repeat. , but somehow suggesting that there was more to be done with data than just transaction processing was, was what I was saying.

And, and I don't know why this was so offensive, but I guarantee you that it was. And so the, the thoughts that you could use data for something other than just transaction process. The thought that we needed, not application data, but enterprise data was where I was headed. And those thoughts led me to the, the thought of a data warehouse.

Now, I have to tell you, the first data warehouse that I built was not perfect. Over the years, I've learned a lot about how to build a good data warehouse. But the, nevertheless, the first data warehouse that we built was actually very effective. and very useful. And then from that, the first book came.

Now, in that day and age, people were dealing with structured data and, structured data was where it was at! In today's world, we're dealing with a lot more kinds of data that in addition to structured data, we also have textual data and textual data, quite frankly, in terms of business values holds more than structured data does.

People haven't discovered that yet, but they will. We, we also have analog IOT data and, and data that comes from machines, and it too has its own. But these different kinds of data are, are, are in fact very different in terms of their characteristics, how you use them, how you handle them, what you do, et cetera, and so forth.

Now, a while back, somebody came up with something called a data lake. and the idea behind the data lake was you take the data, you throw it into your data lake and people can use it for analytical processing. Mm-hmm. and, and between you and me, the data lake was one of the worst ideas to ever enter our profession.

It is, it is, quite frankly, a stupid idea. Somebody that didn't know what they were doing, invented the data. . Well, so, because what happens is when you start to just throw data into the data, It turns into a data swamp.

<Piyush> 

Absolutely. Absolutely. And, and the quality can’t be attained and you're just blindly putting it i- So without thought, so yeah, please go on.

<Bill> 

And, so, or a data sewer, a swamp or a sewer or whatever. But it's not a nice place because it's a liability, not something useful. Mm-hmm. A data lake is a stupid. and so, but, but a lot of people listen to their vendors now. This is part of the maturity of our industry.

Mm-hmm. , part of the maturity of our industry is listening to vendors. Yeah. Vendors are, vendors are there to sell their product. That's what they're there for. . At any rate the, the data lake appeared and, and I thought it was a atrocious idea. So I, I said, what do we need to do to data lakes? To make data lakes useful People are doing data lakes, but they're, they're doing something that's really done.

So I contacted my friends at an organization called Databricks. Mm-hmm. and, and, and they are leaders in, in the world of the data, data data lake house. And so I, I decided we need to figure out a way to help people out of the mess that the, the world of Data Lakes has made. Mm-hmm.

So that's where the idea of a data lake house. and the, the, the first book that I wrote with Dericks was a very successful book, but I, realized after reading the first book myself that there was a lot more to be said. The idea of a data lakehouse. By the way, I got an article today, somebody saying the data lake is the same thing as the Data lake house.

This is a person who. Read anything. Mm-hmm. , they, they need to know what they're talking about because they may sound the same. A data lake may sound like a data lake house. It's not the same thing at all. And so, anyway we just published with a fellow named Ranjeet Srivastava, my friend and, and co-author in India.

We, we just produced a new book that we realized the first book on Data Lake. , it was a good book, but it only introduced you to the topic. There was a lot more to be said about the infrastructure that is, is necessary be built. . So we just had the the latest book is the Data Lakehouse Architecture, which is really a much larger description of of, of what needs to be understood about the data lake.

<Piyush> 

Obviously I haven't read that, so just I'll ask is that something which is also endorsed by Databricks or it, does it go beyond Databricks technology?

<Bill>  We're certainly friends with Databricks. But, but it's, it's not a Databricks product. It's an architectural product.

<Piyush> 

Got it,

Because, you know…and having worked with you all these years in the industry and, and been at conferences with you, I know how you think and you don't typically align yourself with one vendor. You've maintained your independence and independent thinking, and that's what I like about you, Bill

That's right. You have an opinion, a point of view, and you're never shy to express that. You're right about. That leads to trolls and, and the stuff that you were talking about, some dirty letters being written to you in the past. But in today's world, people just trolled you on Twitter or on social media and, and of course making your opinions known is the sign of a thought leader.

<Piyush> 

And some can do it in a very civilized manner. Like you but then there are others who get into dirty roles as well. I won't go there, but certainly we have seen you and your dignified behavior in, as the industry has evolved and as the vendors who have touted their products have evolved as well.

Now where do you see the role of any industry body coming in? and, and trying to kind of being there as the overarching architecture pattern setter or, you know, in, in those days I used to look up to T D W I as one organization. It used to be called the Data Warehouse Institute at that time. But now there's a different acronym for it and, and the number of industry bodies where we talked about DAMA

There's I AD Q as well. Do you see the role of an industry body coming in and saying, well, let, let's not have a vendor have a play in it. And I, I won't relate it to what happened during the Hadoop euphoria where there were multiple Hadoop distributions out there, and then the open source foundation was, was involved.

And so I want to see how you think about the industry bodies in the data. ?

<Bill>

Well, I'll tell you a little story. Mm-hmm. I, used to speak at T DWI and then one day, and I, I had great hopes for T DW I as been something that could be a a useful important body. And then one day for reasons known only to themselves they told me that I was never welcomed at T D W I

That, I could never speak there again. And I'm sure they had their reasons. I don't know who got mad at me, but I have my suspicions, but T D W I, for whatever reason I remember the day. I was, a guest. At one of the vendor's booths and, and I wasn't even officially at TDWI. I was a guest at one of their booths and one of the T D W I officials came over with a security man and said I was not welcome at T D W I.

And so again I don't know, you know, I, don't go into nastiness, but what happened happened, and I, again, I've been very disappointed in T DWI because I thought in terms of industry leadership, they had great potential. But they certainly have not fulfilled that potential.

And the problem with vendors looking to any vendor for industry leader. The vendor is not there to lead the industry. The vendor is there. Who, and I don't care which vendor you're talking about. Mmm this is all of the vendors. The vendor is there to sell their product a and, and, and that's what vendors do.

And so vendors are kind of disqualified from having a position of industry leader. I, can't speak to any other organizations. I, quite frankly, I wish there were an organization that was interested in industry leadership. I wish that to be the case, and if they ever arise and want to have my participation, I'm more than happy to participate.

Vendors are not the answer. TDWI certainly isn't the answer. So,

<Piyush> 

So what about O S F and, and DAMA has been there in the data field, but now I'm transitioning into the, you know, what O S F has done for software and Linux in particular, and revolutionize the computing industry per se. Do you think there's something coming for decentralization and any, any approaches somebody in the industry have made to you to create something like that?

<Bill> 

I have not had one message from O S F that, that they've never talked to me. I, don't know who they are. I don't know what they do. I would, if, if they're interested in industry leadership, I'd love to talk with them. But, but to this date in 2023, I've never had one conversation. Now DAMA is another story.

DAMA has got some really good intention. But the organization of DAMA has been, for a variety of reasons, a shaky organization from the beginning. They, don't have the financial or, organizational wherewithal, but, they're in the right place, but they haven't been executing very well.

<Piyush> 

Yeah, it gets tricky with the nonprofits and the industry consortia, and sometimes they are they are pushes and pulls with respect to funding from one vendor versus another. And somebody's agenda getting pushed. So that's why I always look for a day and age where we'll have. Vendor neutral bodies who will maintain that pristine ness of concepts and, and make the industry go forward.

And of course, there's a lot to be learned from the engineering profession as such. And there are other professions which are. Lot older than data and computer science, and we definitely have a thing or two to learn from those professions. Look at the civil engineering profession. They've got standards.

Look at the mechanical engineering. And for just fyi, I lead an organization called American Society of Engineers of Indian Origin. And we do a lot of stuff trying to standardize on things and, but doing things for the next generation. So I totally get where you're coming from and let's see what trouble we can make by shaking up things and doing things for the data industry together.

So I really appreciate you for being very candid on this conversation. And will, one last question I would like to pose is what do you think about the modern data stack today? There's a lot of talk. They distributed especially during this blockchain and you know, what, what you see about crypto everybody's been talking about decentralized mechanisms.

So as an equivalent of that, or what, what do you see for the data industry in, in that space?

<Bill> Well, okay, I'm gonna give you my honest, honest opinion. The modern data. Is an attempt to understand architecture through the eyes of a vendor. That, and, and, and, and as such, as such, it leaves a lot to be desired.

And, people recognize they need architecture. The people know that intuitively, and, they do. But, the. Vendors don't like architecture because it means that the vendor may have to say something good about a competitor. Vendors don't like architecture, so however, people that are doing this kind of work, they know they need architecture.

So, the vendor stack that we have today is to glue together different pieces of technology and form and architecture and so that's why I'm not a big fan of the vendor stack in that extension.

<Piyush> 

The data mesh has been getting some lot of reviews. And, but, but I, I also know that it's not a fully fleshed out concept and it's still in its infancy.

In your line of work. Has something like that appeared or any of your clients have asked you things about modern data stack and the data.

<Bill> 

I get asked about data mesh probably two times a day every day. Right? And, and so, so I think that the general notion of outlining the problem by data mesh is very good.

I give them high marks for that. The thing that disturbs me about data mesh is that in terms of looking at or thinking about the integrity of data, that let me try to explain this a different way. You've got all of this machine learning, all of this AI stuff out there, and it's all wonderful.

It's all it, it has very good things. But, but all of. Start with the assumption that the data that they're operating on is valid, believable data. And the problem is if you try to put machine learning and artificial intelligence and data mesh on top of data that isn't, , believable, valid, and useful data, then you've got nothing.

And, and, and, and AI is wonderful, but it doesn't work on data that is improper data. So, mm-hmm. , so, so so ai, machine learning data mesh ChatG P T and all this stuff that they all to say, well, well, here's the data. Well, that's not true. That, there's, there's. . Oh boy. Is that not true?

And, and if they started with, well, let's start with valid, believable data, then work on data mesh AI or machine learning mm-hmm. , then we'll go. But they don't do that. They say, oh, mm-hmm. , let's let's let's, let's go right straight to the fun part. But, but let's, the foundation that we're building on is not a solid found.

and, and so and so if I, I don't wanna be critical of data mesh. I think that there's a lot what I, what I've read a lot of very good things that they do, but, but the foundation of, and, and, and vendors hate that message. Why? Because going back and finding and getting data operate right on is. Hard, complex, ugly business and, and, and vendors run from the integration of data.

<Piyush> Trust me, Bill you, you are expressing my innate thoughts as well because, you know, I've been American consultant in the data space with pw, pwc, and I B M and now with the Google and other cloud echnology providers that I work with in. Foundation, building a solid data foundation has always been the message that I've been giving to my clients.

And you know, as, as you know, I've worked with I A D Q, the data quality organization. For 10 years. I was on their board and working with Larry English and Tom Redmand and, and Dr. John Talburt, who have been proponents of data quality. They've written books about this & are very much in sync with this.

But you know, a lot of time that that's dirty work. That's dirty work. That's very high intensity work, which not many people want to put effort in and they want to see, oh my goodness. You just ask a couple of prompts and questions to Chad, g p t, and you get the answer, well, why should we do the work? The hard and dirty work?

And that's the way that modern generation has started believing in the last one. That they've been looking at chat g p t and saying, oh wow, AI gives us everything and it's all available for free, but there's no free lunch here. There's something if we don't clear the legacy data, if we don't standardize it, if we don't extract value of it, then you know, it's, it's as wasted.

There's lot of hidden data that can add value to an organization, and that's the structured to unstructured combination and integration of the data that you had been talking about in your data architecture stuff. It's still valid as it was earlier, and what we have to do is how we integrate the modern thinking into it and how we take it.

<Bill>

Absolutely. You know, I, liken integrating data to planting tomatoes in the springtime. If you're gonna go plant tomatoes in the springtime, you're going to get your hands dirty. I, don't believe there's, I don't know, maybe there is a way to plant tomatoes without getting your hands dirty. But but my wife and I certainly don't.

We get our hands dirty and, and, and, and vendors and, and consultants don't like getting their hands dirty, but, but that is what you've got to do. And, and, and I, there's, there's no way around it. And I, I, you know, this chat G P T thing, you, you know what it reminds me of? It reminds me of Elizabeth Holmes and FTX Investors love, what do they do?

Investors love the glitter. They, they love the the sexy part. They love the the wonderful part. But, but in terms of reality of does it work, does it really do what it says it does investors would rather throw their money away with Elizabeth Holmes than go in and solve what the real problem is.

And so so when I see this chatGPT, and by the way, the chatGPT things, this really, really elegant, I mean, I mean, I, don't think anybody that can see the chatGPT stuff will say this is not elegant because it. , But at the end of the day, it's a toy because it doesn't address the root problem of getting the validity of information.

And, but investors, do they care about that? No, they don't care about that.

<Piyush> 

But one thing it has done is trigger the imagination of the next generation to be interested in this and democratizing AI in not only our industry, but in our general life as well. So I hope newer applications built on solid foundation will come and the industry will adopt it in a sustainable manner.

It'll take our industry forward. So with that will, thank you so much for being here. Any last message to the youngsters listening who are just getting started in the field of data? What is one liner message that you would have for them?

<Bill> Well, let me tell you something. If, if I were a coming out of college and going into the industry, The closer you can get to where business decisions are made, the better you're going to do in your career and the safer you're gonna be.

Because when organizations have layoffs, guess they layoff first. They lay off the people that are not close and important to decision making. So that's one piece of advice.

<Piyush> 

Wonderful. Thank you very much, Bill, for being here. And thank for all the Wisdom that you shared with me and my audience today. So with gratitude from my heart to yours, this is my way of saying thank you and have a safe and healthy life.

Thank you.

Discussion about this podcast

Curious Mind Wanderings
Curious Mind Podcast :The Digital Agenda
Interesting thoughts at the intersection of Science, Technology & Art. The Digital Agenda is all about Data, Analytics, Cloud, AI/ML and Emerging technologies & strategies that are transforming not only companies but also entire industries and even our lives.