Big data circular visualization. Futuristic infographic. Information aesthetic design. Visual data complexity. Complex data threads graphic visualization. Social network representation. Abstract graph

The Data Science, Big Data, Data Analytics, Artificial Intelligence and Machine Learning Hype

Not only in Gartner’s Hype Cycle for Emerging Technologies but nearly in every Blog and Newsletter, the topics Data Science, Data Analytics, Big Data, Artificial Intelligence (AI) and Advanced Machine Learning (ML) is number one since some month. The hype about this technologies is on it’s top. Smart Factory (Industry 4.0) also contributes to the fact, because on of the four pillars of Smart Factory (Industry 4.0) is Data Analytics and Big Data.

But how all these relates to each other?
The base for all the listed topics is data, which is first created and saved from various sources (sensors at machines, user behavior on websites, applications and computers and many more), then archived and finally analyzed to answer specific questions, to find patterns or to show special constellations.

The data is the golden asset for a company in the future and it’s very important to save and archive the data now. It’s absolutely worthless to tell everybody that we could have all data i.e. for transactions, customer behavior, machine processes and application logs, but we don’t activate or install the necessary sensors and don’t store and archive this data. Only when much as possible data from start to end of a process will be saved, including also the data of the final result, then a person called a Data Scientist can use this data and try to answer questions which cannot answered otherwise. This leads EMC to the prediction, that “the amount of stored data is growing faster than ever before and experts states that by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet” [1].

But what is the difference between Data Science, Big Data, Data Analytics, Artificial Intelligence and Machine Learning?
With the recent boom about this topics, also a lot of confusion about the terms starts. First of all: There is no clear definition. Lots of companies and Universities have different definitions of that terms, but the most describes Data Science as the overall umbrella over Data Analytics, Big Data, Artificial Intelligence and Machine Learning topics. The most also use the terms Data Analytics and Data Analysis synonymously.

Big Data refers to large and complex data sets (volume & variety) that’s much larger than the traditional data sets with a higher speed of data processing (velocity). Volume, variety and velocity (called the 3Vs) are the three defining dimensions of big data. For more information about traditional data sets, you also might have a look at Do we still need a Enterprise Data Warehouse?.

When we think about the traditional “3V’s”, explained above and mainly accepted in the industry as a definition, we recognize that Enterprises have been handling that for longer than a decade now, without problem. So, there must be a other definition for Big Data.

I will stay with the 3V’s, but will mention the value we are generating for the business out of the analysis of the data. That’s the difference to simply dealing with volume, variety and velocity. So, I think, with the first ‘V’ as ‘business value’ we will be better served. Beside that, a important fact for that is, to successfully combine your analytic capabilities, your source data and your business needs. With that, our second ‘V’ should be the vision, what is required to fulfill that. The complexity of every very large enterprise today requires our new third ‘V’, virtualization to simplify and accelerate the efforts of our new first two ‘Vs’.

To explain the remaining three terms I will write separate posts, because otherwise this post will get to voluminous. So, stay excited for the next post.


  1. EMC: IDC Digital Universe Study: Big Data, Bigger Digital Shadows and Biggest Growth in the Far East 2011.
    Retrieved: 14.06.2017.

Original Post:

Do We Still Need a Data Warehouse?

Do we still need a Enterprise Data Warehouse?

On the way studying for a Microsoft Data Warehouse Exam, I was asking myself, if today, a traditional enterprise data warehouse is still needed and the time I’m spending with my studies is worth it. I think there is no question that data has become more and more important and is nowadays a strategic asset for companies to transform their businesses and uncover new insights. But does a traditional data warehouse fit’s into that?

A data warehouse which is categorize as “traditional” and that’s what my studies about, has the main target to be a central repository for all historical information in a company with the assumption, that the data would be captured now but analyzed later. For this, various data from transactional systems like ERP, CRM and LOB applications are extracted, transformed and loaded (ETL), normaly first into an staging area and then cleansed and enriched and afterwards transfered into tables, that means an relational schema, in the data warehouse. The resulting data warehouse becomes the main source of information, a central version of the truth, for report generation, analysis, and presentation through ad hoc reports, portals, and dashboards.

What insiders recognized is, that the data warehouse described ahead is undergoing a transformation. Virtualization and moving resources to the Cloud is one reason. A nother reason is, that organizations try to incorporate insights from data that don’t fit the traditional relational database model and that the velocity of how that data is captured, processed and used is increasing. Companies are using now real-time data to change, build, or optimize their businesses as well as to sell, transact, and engage in dynamic, event-driven processes like market trading. The traditional data warehouse simply was not architected to support near real-time transactions or event processing, resulting in decreased performance and slower time-to-value.

A modern Data Warehouse has to support workloads of relational and non-relational data, whether they are on-premis or in the cloud and whether they use on-premis solutions or solutions and servies in the cloud. The so called “Logical Data Warehouse” (LDW) or “Modern Data Warehouse” uses repositories, virtualization and distributed processes in combination. Instead of working through a requirements-based model of the traditional data warehouse where the schema and data collected is defined upfront, advanced analytics and data science uses the experimentation approach of exploring answers to ill-formed or nonexistent questions. This requires the examination of data before it is curated into a schema allowing the data to drive insight in itself.

So the recommendation and the answer to the opened queetion is, that companies should use both approaches and for established data warehouse teams to collaborate with this new breed of data scientists as part of a move towards the logical or modern data warehouse.

Original Post:

A cloud for everyone on every device.

Microsoft unveils Azure tweaks before partner conference

SAN FRANCISCO – Microsoft hopes to steer attention away from this week’s layoff news with the kickoff of its Worldwide Partner Conference July 12-16 in Orlando.

While CEO Satya Nadella delivers the keynote July 13, he may have to briefly address the topic of 7,800 Nokia employees who will be let go as the software giant continues to untangle itself from that acquisition. But the focus will be on urging the company’s army of global resellers to guide customers toward its growing cloud business, Azure.

To that end, Microsoft announced Friday that its Power BI business analytics service would be coming out of beta on July 24. The service promises to allow improved, streamlined access to cloud-based data.

“We believe Power BI is, by a very wide margin, the most powerful business analytics SaaS service,” wrote James Phillips, vice president of Microsoft’s Business Intelligence Products Group, in a blog post. “And yet even the most non-technical of business users can sign up in five seconds, and gain insights from their business data in less than five minutes with no assistance, from anyone.”

The company promises to unveil other Azure improvements during the conference, which also will feature talks by COO Kevin Turner as well as inspirational speeches by the likes of Tommy Caldwell and Kevin Jorgeson. The two mountain climbers recently completed a daunting ascent of one of El Capitan’s toughest routes – the Dawn Wall – this past January in Yosemite National Park.

Original News: