Social Media Bots

Do Evil – The Business Of Social Media Bots

Warning: reading this might lead you to lose faith in the media and marketing world! Oh – you didn’t really trust them to begin with? Well, let’s dig right in and meet the now-removed Twitter account from @AI_AGW. He’ll hold an intelligent debate about global warming with you. But as smart as he is, you may be surprised to find that he is, in reality, a bot – a computer program. Bots like him cannot only talk to you, they can easily skew algorithms, influence your opinion – or, in some cases, cause a lot of trouble.



Before we dig into the dark world of bots, let’s find out what they are. Bots are algorithms acting in social media networks. But to the outside world, they look like a real user. They can come in all shapes and sizes, and they are borderline perfect. Some of them are very simple. And there are loads of services that will offer you bots, ranging from bots who will like whatever you post and fake followers to much more.

Building them is easy. You can actually try this at home. All you need is account, along with an RSS feed and maybe $10USD for 1,000 fake friends – all of them bots (Watch this movie to learn about @spotthebot and how he was built). Or better yet, download your very own bot software to wreak havoc on social networks in the comfort of your own home, within mere minutes. Much of this is even freeware (check out GitHub, for example). If you need a bot that can actually do conversations with you or others and pretend to be a human, you might check the Gonzales tutorial for code.

Social media bots can be scarily natural. A study showed that 30% of users can be deceived by a bot Tweet this. Well made bots can even gain your trust. For example, meet Lajello, a fictitious member in a book lovers’ network. He became the second most liked and appreciated person within this network. Why? Because he automatically recommended books to every other user like an Amazon recommender system. Lovely and friendly, right? Thus it should not be a surprise that 1 in 5 of us accepts unknown friend requests, openly letting bots into our world.


Bots are actually more common than you might think. Twitter is fighting bots via either legal actions or various machine-learning programs. Still, from time to time, researchers dig in and find that, for example, 7% of tweeps are non – human but spam bots. Companies like statuspeople () have built lists to actually spot fake Twitter followers, and you will be surprised to see how real some of them look. For example, 99.9% of @spotthebot’s followers are fake Tweet this… check them out. They are all bots!


So what is the problem with bots? An automated program like AI_AWG talking about global warming isn’t all that bad, right? Or what about the bot who tweets to you when your plant needs water? (Seriously – read about it here). In reality, bots can do things beyond our wildest dreams or nightmares. Here are some examples:


Can bots make you famous? Oh yes they can. Read here about how Chris Dessi (@ChrisDessi) created some fame with as little as 50k followers.


How about when bots persuade you to buy stuff? This is similar to email spam, or the nagging calls from the insurance companies to please buy their life insurance.


Let’s create bots that harm others; for example, our competitors. Think about what would have happened if 10,000,000,000 bots would have shown their identity shortly before the Facebook IPO. What would this have caused? (I actually know someone who tried to build such a system, except all of his bots got ‘killed’ shortly before the IPO. Facebook knows what’s at stake here.)

To harm others, even very simple bots can be useful. Just sign your worst opponent up with a lot of fake identities and help the world discover it. Newt GingrichMitt Romney and the German Conservative Party (CDU) are just some examples of cases where fake-following was proven or assumed to have happened.


One of the real impacts of bots is to skew public opinion. If you have built an army of bots that like, read, and engage on a set piece of content, you might as well influence what is trending. China tried this with the so called “5 Mao (50 cent) army“: over a quarter million bloggers who wrote articles for as much as 5 Mao per article to complement its government information politics. With my former startup (sold to WPP ) I had investigated this army for various clients and we saw a steady decline of their power. Perhaps they were replaced by bots? This is not as unlikely as you would think.


During the Arab Spring movement, we measured how the government was disrupting protesters’ activities with continuous tweets. By spamming the “stream” with tweets, important messages sent by activists were pushed lower on the page and out of sight by an automated system.

It is safe to conclude that the bot business has now gone beyond the scope of marketers  looking for fame and sales success. In fact, bots are now big government’s business. For example, the US Air Force revealed that it solicited Ntrepid, a California based company, to create software that would enable it to mass-produce bots for political purposes.


If there is one common denominator between these various uses for bots, it is that they all clone human activity in the limited world of social media. On a large enough scale, they might skew the ‘trending topics’ algorithms of various social media networks – and according to this survey, journalists often trust these trends. Think about the power if the tweets from your bot come up in google search results (now possible since google and Twitter signed a deal to make tweets more searchable). Thus the real power over the media is owned by the ones who run millions of social media bots.

Next week we will look into how one can spot bots and we will discover that some social media gurus have a lot of fake followers. If you cannot wait – just sign up for my newsletter or read chapter 6 in my book “Ask Measure Learn” by O’Reilly Media.

(republished from my Forbes Column)

The Stages of Analytics - Courtesy of Blue Yonder

Predictive Analytics – A Case For Private Equity?

[vc_row][vc_column width=”1/1″][vc_column_text]Framed Data raises $2M, 6Sense raises $12M, Reflektion raises $8M… The list goes on and on. What these companies have in common, aside from multi-million dollar investments, is that they are all in the market of predictive analytics. Predictive analytics is the art of making big data work by using past data to forecast future behavior. Who will churn? Who will buy what? Predictive analytics is at the core of Data Science.

Blue Yonder is one of those companies, and they just secured funding of $75 million from the global private equity firm Warburg Pincus . This is a unique deal in many respects. It was the biggest deal for a predictive analytics company in Europe and it was done by a PE firm, thus worth taking a second look – what is happening here?

I got an exclusive interview with Blue Yonder’s CEO Uwe Weiss (@WeissU). He explains that the market for predictive analytics is set for growth, how analytics meet transaction and why the future is in the automation of mass analytical decision in real time.

Read the full interview here or see below for the highlights and amazing insights of my interview with Uwe:

The Stages of Analytics - Courtesy of Blue Yonder

The Stages of Analytics – Courtesy of Blue Yonder

Analytics benefits the customer. 

Predictive analytics provide customers with a number of benefits resulting in significant cost savings and operational efficiencies primarily based on accurate real time forecasts which enable automated decision-making such as demand forecasting, dynamic pricing, replenishment, churn prediction, and predictive maintenance. Industries benefitting from predictive analytics not only cover retail and consumer but also increasingly finance, travel, energy and manufacturing.

Automated analytics is the future of predictive analytics.

Three different kinds of analytics have been defined by Gartner: descriptive, predictive, and prescriptive. Descriptive analytics describe the analytics that have been around since the dawn of business analytics in the 1980’s, traditionally known as reporting with simple descriptive tools, such as frequency distributions, charts and graphs. Predictive analytics use models describing past data to predict future trends. Prescriptive analytics provide recommendations to front-line workers. Uwe believes that there is a fourth category, “automated analytics”. Tom Davenport recently postulated that automated analytics is a further extension to prescriptive analytics, eliminating the need for the human to action prescriptive analytics. So, instead of a middle man changing prices or determining what kind of marketing email to send to their client base, it can be done through applications. Uwe believes that “ 99 percent of automated business decisions can be automated . If you automate business decisions, you need predictive analytics.” (read quote)

The predictive analytics market is established and growing.

The market for predictive data is forecast to grow at a compound annual growth rate of 34% from 2012 to 2017 reaching $48 billion, according to Gartner. Venture capitalists were quick to make early-stage investments into predictive analytics offerings. Uwe believes that any new startups in this space have to show how they are different in order to break into this already growth market. “The technology is at the plateau of productivity. People can use this technology now and produce ROI.” (read quote) Uwe believes that this is because the need for predictive analytics is fairly independent from traditional economic cycles. This view is matched by Gartner. Already in 2012 they predicted that predictive analytics technology is in a mature state. It is the lifeblood of a real-time enterprise and no longer an art in a “dark corner”. Working under these assumptions, it has become logical that Uwe sees the opportunity for a bigger enterprise software play. The time of predictive analytics being in a corner is over… Let’s go for scale.

Uwe Weiss @ Blue Yonder - Foto: Martin Leissl

Uwe Weiss @ Blue Yonder – Foto: Martin Leissl

Let’s go for scale! 

But hold on? Scale? So far, we see a lot of players offering niche expertise in one specific area such as churn management or marketing segmentation. Why? Because solutions are built around specific domain knowledge. The impact of domain knowledge in Data Science is often discussed; at the Strata conference in 2012, there was a lot of discussion over whether domain knowledge is important or can be solved by brute force. Senior experts in this domain such as Monica Rogati (@mrogati) and Xavier Amatriain (@xamat) from Netflix showed why domain knowledge is very important.

As Uwe explains, one can “build their software architecture around their domain.” The advantage is clear: “you can run very fast and catch a lot of market share” in doing this. (tweet this ) 

Uwe believes that is dangerous to remain in only one domain. “We go for the broader perspective and we have seen that we can – with our people and our engineering – tackle more domains and add domain knowledge relatively easily.” (read quote) He has geared Blue Yonder towards handling the bigger questions first – data digestion, time series management, and the handling of diverse machine learning algorithms. This generic-built predictive applications architecture allows Blue Yonder to integrate vague domain specialities into their platforms very rapidly. Thus, instead of software architecture being built around their domain, they are bringing domains into their software architecture.

Okay – I get it now. This is the new accord. This technology has reached the plateau of productivity . The market looks for an enterprise ready platform that can scale by incorporating domain knowledge, one domain at a time… And that’s why a private equity firm has invested in Blue Yonder and into the area of predictive analytics.

Predictive analytics is stable against economic cycles and will continue to grow fast.

Uwe points out one more reason why a PE firm would be looking to make such an investment. Predictive analytics is stable against economic cycles. “People buy food. People go to restaurants. People use trains and buy tickets, and so predictive analytics will have their space. The fundamental belief [of predictive analytics] and potential for explosive growth is responsible for private equity entering this area.” Even in an economic downturn, predictive analytics embedded into critical enterprise processes such as pricing, replenishment, logistics, and supply chain yield management will be cost-effective. Predictive analytics will always be needed and indeed the market opportunity here is huge.

When asked to comment on this investment, Joseph Schull, a Managing Director at Warburg Pincus, commented that  “the transformation of Big Data into actionable information is at the high growth frontier of enterprise IT .  Our investment in Blue Yonder is the result of an extended, international search for the opportunity to participate in this major new wave and growth opportunity.”  It is now clear that Warburg Pincus invested in their fundamental belief in the hyper-growth of this market.

With all of the excitement and development in the predictive analytics market, what’s next for Blue Yonder? What will the $75 million investment enable them to do?

“We want to place our bets on the right areas, so America is definitely the next step on our internationalization roadmap.” (read quote) Even with the geographic expansion, Blue Yonder will continue to expand on the enterprise software level. “Product-wise, we will focus on the domains we have defined for ourselves. This gives us a lot to do for the next 24 months. I am convinced that we will see a mass market for predictive applications from 2016 onwards. All of the global 2000 Leading Companies in all relevant industries will need to adapt and use it . ” (read quote)

Thus, the prediction about predictive analytics is that there is more to come. Stay tuned – and subscribe to my newsletter to learn more about Big Data and why we actually want Small Data! The full interview can be read here at

(republished from my original FORBES article)[/vc_column_text][/vc_column][/vc_row]

Uwe Weiss @ Blue Yonder - Foto: Martin Leissl

Uwe Weiss On Predictive Analytics

[vc_row][vc_column][vc_column_text]LUTZ FINGER: Uwe Weiss – you are long-standing entrepreneur and the CEO of Blue Yonder, a predictive analytics company. Thanks for being with us.

UWE WEISS: Thank you for having me.

LUTZ FINGER: You landed a really large deal last year. Your company got funded by Warburg Pincus. The deal was for whopping 75 million USD – one of the biggest European investments in 2014. What does Blue Yonder do?

UWE WEISS: We create predictive application platforms. We were founded in 2008. We offer predictive analytics for various industries.

LUTZ FINGER: You used the words “predictive analytics.” That’s today one of those hype terms. You did not use the words “big data” – how come?

UWE WEISS: You caught me. I think big data is a meta term like the expression “internet of things.” These concepts make data accessible. Big data as well as predictive analytics have been on the market for a while, but predictive analytics has traditionally meant means data mining and looking at past data. To expand, we needed an applications platform that helps decision-making based on analytics and large amounts of data. We are happy to work with in the “big data” wave and we are happy to work with the “internet of things,” but whatever we do, we need to prove the value that these terms might bring.

Blue Yonder Logo

Blue Yonder – Forward looking – Forward thinking

LUTZ FINGER: Haven’t predictive analytics been around since the 80’s? We had business intelligence and everybody got excited about having big databases, which we didn’t call “big data” at that time. BI uses predictive analytics, in order to do something that is actionable. Now, what has changed since the 80’s? Except that we are all 34 years older.

UWE WEISS: Many major technology trends really need a while until they find broad use and many things have to come together. You need time for new technology to finds its’ way into the market. We are now in the exact the right place. This is seen in the transition of business intelligence. First there was reporting, and then it was business intelligence analytics and visualization. Great companies have come out of this market, but what we understand about predictive analytics is that it is something that falls neatly into the mold of the big data scheme. With much better hardware we can run predictions on more data and faster. The software layout is transactional: the present source planning, warehouse management, order fulfilment, ecommerce execution and the analytical layout. These layers were traditionally separated in most software, from an architecture perspective. What is new is that these layers stay together, so analytics meet transaction.

LUTZ FINGER: Nice. Analytics meets transaction. That’s “tweetable.” (Tweet NOW!) You are in a super “hot” spot right now. In the last year, I have noticed a lot of companies claiming a similar space. Earlier, you mentioned the “internet of things.” Otto is one of your shareholders, which is a big retail chain in Germany. Last year, Google Ventures invested $2 million in seed funding in Frame Data, to do user behaviour predictive analytics. And Reflection Capital, Intel capital and Nike made a big investment as well into startups supporting retailers. Lattice Engines got a $20 million investment, and they do marketing predictions. That’s a lot of big investments. And now, you, with $75 million. This is the top of the top. What makes the difference?

Analytics meets Transaction

UWE WEISS: The predictive analytics technology and concepts have been on the market for a long time, as we talked about before. It is still a relatively early market when it comes to the adoption and the enterprise level. We are talking about adoption cycles. Most people within the predictive analytics area try to find their way in the market using domain knowledge. We tackle the broader market of the underlying platform for predictive applications across domains. Generically, we have solved many of the tricky questions of data digestion, time series management, and the handling of lots of diverse machine learning algorithms because we don’t think that there is a one size fits all algorithm in the world. We believe in robustness and bring enterprise-level 24/7 usable consumable cloud software on top of service. We are not just a tool and we are not restricted to one domain. We are ambitious enough to say that we can tackle the free-to-form verticals based on scalable protocol – generic, predictive applications.

LUTZ FINGER: In this market, there is the debate about how much domain knowledge is needed versus how much technology knowledge is needed. You have created a platform out of this technology knowledge and go one industry at a time, instead of building a solution around domain knowledge only. That’s the way you are tacking it?

Domain Knowledge is Important but not Everything

UWE WEISS: It is dangerous to remain in only one domain, because you tend to build your software architecture around your domain. You can run very fast and catch a lot of market share. But then you get stuck. We go for the broader perspective and we have seen that we can – with our people and our engineering – tackle more domains and add domain knowledge relatively easily. We usually don’t have to start from zero and are able to have vague domain specialties integrated into our platform rapidly. We don’t tackle all of the vertical markets. At the moment we focus on retail, consumer goods, completing our roadmap there, but we have a growing number of customers in travel, manufacturing and energy. If a new market looks healthy from a growth perspective, we enable ourselves to go into more domains. For our customers, that means solving more than one problem at a time.

LUTZ FINGER: I published the book, “Ask Measure Learn.” I claim that domain knowledge is important in the “Ask,” which is the business-oriented part. What would make you guys different from the platforms that are out there already, like R-studio or SPSS?

UWE WEISS: I see one difference between ourselves and other players. We learned early on that there is a market for tools, applications and solutions. If you tackle this market, you will always be a part of critical enterprise processes like pricing, replenishment, customer management, and customer complaints. You have to build a platform with high availability – 24/7availability – and have to have super robust machining algorithms and domain knowledge. There has to be a transactional layout. That means we don’t only have the findings and insight, we also execute milions of transactions for our customers. We set the prices in their e-commerce platforms, we make replenishment decisions and so on. The transaction volume and throughput comes into play. When you have customers which process hundreds of millions of transactions per day per customer, you have to have a scalable platform which does not only contain the insight part, which you probably would call the “measure” part and “learn” part, but you also have to have the transactional part that can execute transactions and permanently integrate data sources into a streaming process. You also have to make decisions. That’s a differentiator we see in the market, and we see the market moving more and more towards supporting companies to become predictive enterprises. It’s great to have predictive analytics insights, open source tools or other available tools, but in the end, when you have these great insights, you want to transact and want these insights to become a part of your transactional software landscape. And that is the path we are following in the market and that is the right one for us.

LUTZ FINGER: There are two approaches here. One approach is to use domain knowledge and create a specific solution. However, in enterprise software terms, it becomes harder to scale; therefore, you built the platform with the scaling piece and with domain knowledge. You scale up and enable mutual frameworks for everybody.

Automated Decision Making

UWE WEISS: Our theory is that 99 percent of operative business decisions can be automated, and to automate these decisions, you need predictive analytics. You also need the automation piece and the execution piece, the throughput and the scaling piece. You need to be able to take all of the bits and pieces in an architecture together to make a complex configuration: data signs, learning, machine learning, in memory databases, time series robustness. Add this to enterprise release management and software release management. With these components, you have a platform which can also execute on analytic insight and automate. We tell our clients, “99 percent of your operative business decisions can be done automatically; the exceptions are the ones you should work on with your team.” 99 percent of decisions today are the same decisions you have made in the past. This means you didn’t learn from the data. The automation piece is essential for the usage of predictive analytics, so we see the tendency of analytics offerings getting into this transactional execution market. We also see enterprise software companies trying to glue on predictive analytics pieces.

LUTZ FINGER: So we are scaling up because we are using enterprise software, and we are automating the “science” aspect of this area of expertise.

UWE WEISS: Amazon, for example, has a recommendation engine with automated processes. They are the masters of business decision-making and recommendation automation. They are also the masters of using science and the enterprise software landscape.

Sustainable Growth Market

LUTZ FINGER: I was slightly surprised to see that the investment was provided by Warburg Pincus, because they are known as a private equity company. All of the other deals I listed out were made by venture capital companies. How come?

UWE WEISS: In looking at the maturity of the technology, many of the VC’s have made their early stage investment in predictive analytics offerings. These successful companies are already in a place of growth and not in the seed or early phases. The technology today is is ready-to–rock for an exponential growth – the technology can be applied widely. Private equity investors have seen that and they believe we are operating in a very sustainable growth market. It is healthy in terms of growth and independent from any short-term cycles or hype terms like “big data.” They see that large enterprises and many verticals require predictive capabilities. Predictive analytics is at the core of many services and initiatives. VCs have seen the opportunity in the market. We are convinced that predictive analytics is something all large enterprises and also fast-growing enterprises will use in the near future. Private equity companies believe in the long-term possibility of this market and that this market is growing independently from cycles or economic downturns. I don’t think that their investment is fashionable; I think it’s because of their fundamental belief in this sector.

LUTZ FINGER: You noted that BI has been around since the 80’s, but that it takes time off adaption. When we look at scaling up, we see the availability of data, we see that the machines are there, and therefore there is a scale effect which manifests itself in enterprise software.

UWE WEISS: Yup. The whole sector of analytics is capturing more market share and enhancing BI. Approximately 95 percent of BI budgets went into reporting, visualization and dashboarding. BI. This is a healthy, multi-billion dollar market with nice growth rates in a growth phase. That is important to private equity. In an economic downturn, enterprises will still run this software and pay for it. If you have embedded critical enterprise process – like pricing, replenishment, logistics, supply chain yield management – you are fairly independent from economic cycles because people buy food, people go into restaurants, people go into a train or have to buy a ticket, and so predictive applications will have their space. Fundamental belief in predictive analytics as a growth area is responsible for private equity entering this area. You will see more private equity deals, I think.

Fast growing software service businesses need substantial investment to grow. This phase of the market has become attractive for private equity, and because of the ticket size and the growth, predictive analytics companies can improve right now.

Blue Yonder, Karlsruhe, 27.01.2015

Uwe Weiss @ Blue Yonder – Foto: Martin Leissl

LUTZ FINGER: Got it. The first thing I thought when I saw your deal was that it proves the Gartner hype curve wrong. In the Gartner hype curve, he says that predictive analytics is at the top of the hype cycle. That would suggest that we will soon be slowing down. What you are saying is that we are in a stable situation. We are on the far end. Predictive Analytics is ready and we can roll it out.

Plateau of Productivity

UWE WEISS: I fundamentally think that the technology is at the “plateau of productivity”. People can use this technology now and produce ROI. Analysts talk about prescriptive analytics already. A week ago, Davenport published an article on “Automated analytics.” I like this term – meaning that the analytics layout marries the transactional software layout. Predictive analytics are not for a homogenous market. The tools have been in the market for a long time. Many people talk about machine learning and one size fits all algorithms right now in regards to various learning technologies – supervised learning, reinforcement learning, and deep learning.

You need to be capable of working with all of these learning technologies, but we also need to be capable of embedding it into enterprise processes. To do that, you need material software. Your software has to work for many years. For example, building enterprise software hasn’t changed much in the past years. The fact that you are building robust enterprise software infrastructure is something that you still need to prove to your customers. This has not changed.

LUTZ FINGER: I totally agree. What will we be seeing next from Blue Yonder? You said that some of the funding will be used to internationalise and expand. You’re very well-known in Europe, and definitely super well-known in Germany. This investment was the biggest deal in Germany in the last year. What can we expect to see next from Blue Yonder?


UWE WEISS: We did the biggest software deal in Germany in the last year and one of the five biggest software deals in Europe, excluding ecommerce and also technology enabled businesses. We are based in Germany, the UK, and have a substantial operation also serving the Nordic countries. I think we foresee expansion into the US in 2015 and we already have a few interested customers in the US. I am convinced we will have our foothold in the US in 2015. We are one of the next companies coming out of Europe that will become international by also being established in the US. I think we want to keep a healthy mixture of European style and American style. We are also looking into Asia and the Middle East, but we want to place our bets on the right areas, so America is definitely the next step on our internationalization roadmap. Product-wise, we will focus on the domains we have defined for ourselves. This gives us a lot to do for the next 24 months. I am convinced that we will see a mass market for predictive applications from 2016 onwards. All of the global 2000 Leading Companies in all relevant industries will need to adapt and use it.

LUTZ FINGER: Wow, that’s a cool prediction. I am going to go with that one. Thanks a lot for being with us.

UWE WEISS: Lutz, thanks a lot.[/vc_column_text][/vc_column][/vc_row]

Personality analysis

Give Me The Data! Why So Many Data Ideas Fail

[vc_row][vc_column width=”1/1″][vc_column_text]To build a data product you need to have the right data. This sounds simple, but it is often forgotten by entrepreneurs or top managers alike. Data should not be confused with the product or the business question that makes it powerful. But even if you have a great idea, it comes down to having the right data to make it feasible.

I recently had lunch with a young and highly energetic entrepreneur. He envisions a database containing psychograms of people. Think about a CRM system that not only contains the phone number of your contacts but also a personality profile. Can you imagine what Frank Underwood the main (and mean) character in “House of Cards” could have done with such a tool? Thus no question – even when I have loads of questions in regards to privacy; ethics – I get the value of that tool.

But unfortunately Ideas are easy to dream up, I have a whole scratch Evernote book full of what we could do with data – but is it feasible? If you are a starup person you know this issue:

#Ideas are commodity. #Execution of them is not. – Michael Dell, Dell chairman and CEO – Click to Tweet

In the Data world Michael Dells quote can be adapted.

Ideas are commodity. #Data of them is not. – Click to Tweet

Likewise, Amazon did not think of building a ‘recommendation’ engine. They thought about selling books online and suddenly they had the data – and only then did they think about how to improve those services by using data and recommendation engines.Many of the big data war-stories you read today are based in this area. Another example is how UPS reduces maintenance costs by measuring vehicle performance and understanding patterns of use, then pre-emptively replacing parts before they break down.

I often get asked how to drive product innovation using data. The answer is not much different from any other innovation you do. Alistair Croll(@acroll ) who had interviewed many different product managers or so called innovation officers breaks innovation down into three distinct classes: Sustaining, adjacent and disruptive (read more about Alistair’s ideas here). Data are commonly used in ways that Alistair calls “‘sustaining”. Take Sears as an example: the large retailer began using big data in 2010 to support its existing business model in areas such as combatting fraud, tracking the effectiveness of marketing campaigns, optimizing pricing, and more.

But the use of data can be disruptive as well. TakeLinkedIn. It started out as a networking company. People connected via LinkedIn sharing information like their personal Bio and for which company they work over time. With this kind of information LinkedIn was able to match available talent with the right job opportunities. Their service became disruptive for the whole recruiting industry: to find the right talent, not only much faster but also more number driven. Or takeGoogle– it started as search company. Only later did they realize that by using their data they could change the advertising industry for good. Data has the potential for disruption, and data will change our life more than we can imagine today. On the other hand, having data in no way guarantees that disruption will happen.

The second way is to buy data or get publicly available data.I have built and sold a company based on this idea. Fisheye Analytics gets news and social media across the world and analyzes it. These kinds of data are (apart from the infrastructure cost) free. Let’s get back to my lunch table and the young entrepreneur: public data might be a feasible way for him. He could use public data from social media to extract an psychological image. For example, take IBM’s project U. Under the leadership fromMichelle Zhou( Michelle’s Facebook Page) the IBM teams analyze the twitter stream. Michelle presented the technology to a small team at LinkedIn, and below you can see my ‘psychogram’.


(Picture: system-U-analyzes personality profiles based on tweets.)

I can not comment on well the tool works, mainly because the question of whether this is the “right” data depends strongly on the question you want to answer. Compared to a real psychology test, this is surely nonsense. But if you do not need a detailed test? Compare this with a thermometer. If you stick your hand out of the door in the morning to get a sense of the temperature it will be not a good or accurate measure, but it is completely sufficient to decide whether to grab a jacket or not.

Thus all what this entrepreneur needs to do is to implement something like project U and test whether this fits his business case! But be aware that public data does not create high barriers to entry for your business model. If your way of using the public data is successful, anyone can do so. Take KLOUT as example. KLOUT used to be a service that analyzes the “influence” from someone online. They use publicly available data from twitter. Soon after their launch there were many companies who tried similar things. KLOUT has pivoted by now into a very slick trending content tool. Other companies that measure influence like Peerindex are now looking for more funds using crowdsourcing. Using public data is one of the issues in maintaining their business model.

On the other hand public data is not necessarily a dead horse if you create a competitive advantage somewhere else. Take Google as example. They use linking data that is available if you crawl the web. Google’s competitors figured this out after a while. However by that time, Google had already learned more about the user and started to create a competitive advantage that has been reinforced ever since.

And the third way is B2B services: work with data from other businesses. is a typical example (note: I am a advisor to Compass) They let companies plug in their data, and their service is to benchmark and compare them. Iris.TV is another example. As their COO Richie Hyden (richiehyden8) explained to me, they use the data from their customer to estimate what is the best next video to play. To do so, they use data from their customer: what did people tend to watch after a given video.

Today a whole range of marketing companies exist that use data from the sales process to help predict things. Thus instead of asking someone to fill out a survey about a potential client, these companies watch what the user is doing. How did he come to the website? What did he tweet about? In all of these cases, data products depend on having data.

The best data product idea is not worth anything if you have no data. Do you want to learn more on how to get to data? Subscribe to my newsletter to get some free resources about data products.[/vc_column_text][/vc_column][/vc_row]

Data Demystified

Data Demystified

[vc_row][vc_column width=”1/1″][vc_column_text]”Data Demystified” is the best way to describe this year’s Strata Conference. The Strata data conference has grown substantially this year: more visitors, more talks, more vendors at the exhibition, and more space.

But most importantly, the topics previously displayed have matured. Data is not only becoming easier, the science of data itself has become demystified. It is no longer the task of a few highly specialized data scientists or engineers: today, data is available for everyone. Tweet now! Let’s look at some trends that go beyond the usual infrastructure: sumo wrestling.


TREND 1 – Easier Data

Every infrastructure vendor – from storage vendors to database vendors – will tell you that you are going to get more data from buying their product. That is no surprise. However, this year one can see the trend of easy data. Getting data has become much simpler.

One way of getting data is scraping. Scraping used to be reserved for those who knew how to code. With the industry becoming more mature, this is no longer the case. Take Andrew Fogg’s (@andrewfogg) startup as an example. Last year, this was just an idea at the Strata startup showcase. This year, they presented a very good and stable solution at their own booth. makes scraping as easy as point-and-click. I actually used it to scrape and help me find the best priced car when I moved to the U.S. The important part is that this kind of scraping does not require code: everyone can do it. Everyone can become data-driven. We do not have to worry about technology, but rather simply formulate the right business question.

Another company in this space is They were my personal favorite in this year’s startup showcase. Their vision is also to make data more accessible. It is a data platform for government data. Open data – which was a big topic at Strata years ago – exists now, as governments have opened their data stores. However, this data is not often easy to use, as it is not in the right format. It is nothing a data scientist or analyst could not fix, but the trend nowadays is to make data retrieval easy. Enigma sources the data and makes it available to everyone. Again, the underlying idea is to focus on the business question, rather than the technology.

TREND 2 – Easier Clean Up

What Enigma is doing for public data is what Joe Hellerstein (@joe_hellerstein), co-founder of Trifacta, is doing for the world. Today, the biggest part of data science is ‘wrangling‘ data: in other words, cleaning and re-structuring data. Whether you want to transform all European date formats into U.S. formats or fill in the missing values in a table, you will generally need to code or script to make this happen. The trend at this year’s STRATA is to talk about how to make this task easier. Trifacta’s approach is that you can easily point and click to do all kind of ‘grep’-like regular expressions. The goal of this is to make dealing with data easier so that you do not have to worry about technology anymore.

There is, however, a second reason why I am mentioning Trifacta. The computer industry has long since anticipated that we will be able to “write code” by pointing and clicking, making it ready for deployment without actually writing it. Trifacta has really taken this idea to a new level. In the same way that programming languages represented the end of the assembly coder, companies like Trifacta might serve as the end to the data wrangler.

TREND 3 – Charting, Charting, Charting

You see charts everywhere you go at Strata. Charts are often the first and easiest way to win. Not only can they be helpful to you in exploring the data, they often astonish your audience. Thus, it is no surprise that a lot of companies try to automate this process, starting with Tableau, the top dog, followed by newcomers like DataHero or Chartio. Their promise: with a lot of connectors, we can connect to everything – from spreadsheets to Hadoop – and then see all of the data charted. Thus, there is no need to think visually anymore – just load the data and look at the colorful charts. And then, have you ever heard your audience make statements such as this?

That is so cool… If only I could understand what it means!?

Of course, these tools will not solve a single business question for you. But they do give you a nice way of representing data, so that you have all hands free to think about the real measurement you want. What is the right way to display this data so that it fits your business?

TREND 4 – Easier Predictions

This last trend relates to pure data science. How many people in your organization can run naïve bays? Or SVM? Or nearest k neighbors? Not many… And this is true not only for you, but for many organizations (unless you are a company like LinkedIn – and yes, we are looking to hire even more of the aforementioned people!) Tweet to us if you want to join

With the data industry maturing, data science is also maturing. Easy plug-and-play solutions have become more and more available. For example, take a look at companies like, co-founded by Joshua Bloom (@profjsb), or Skytree, founded by Martin Hack (@mhackster). They offer tools to simplify predictive algorithms. It is like the WEKA package on steroids. Just upload your data and then score it, rank it, rate it… All automated. Worry free. Again – the underlying trend is to free us from technology so that the most important focus is put on the business.

A point on demystifying data science was best made by John Foreman (@john4man), author of the book Data Smart. In order to demystify funky artificial intelligence packages, his tutorial trained the audience on how to use Excel to build machine learning programs. Really? Excel? I thought John was out of his mind. But he is not. It works beautifully. According to John:

Artificial intelligence is just counting stuff…. Excel can do this.

And thus, after 45 minutes of Excel operations I had a naïve bays model that classified 19 out of 20 tweets correctly. It was a lot of fun, and surprisingly enough, John was not the only one talking about Excel as a tool for data science. Felienne Hermans (@felienne) from the Delft University of Technology introduced several plug-ins for Excel (see her blog) to make it a better tool for the data world. This also shows how much the data world has matured: we have started to offer tools that every business consultant knows, so that they can think with us about the right application of data.

The Future: Data Demystified

What is next? It will be easier to get data, and it will be easier to clean, and easier to chart, and by using this data, we can more easily predict…. predict… predict. Yes, what is it that we actually wanted to predict?

Data demystified does not mean that we have solved our problems in this world. It just means we have better technology that enables us to focus on what is important. We work with data because we want to change a behavior or for it to result in an action. No one said this better than British science historian James Burke during the main Strata plenary session:

Information is causing change… If it is not causing change, it is not information.

He then looked in the audience and said,

No information: you are sitting in a seat.

Information: the person next to you has a miasmal disease.”

Yes… We tend to forget this. We are here in the data world for a reason. We want to change the world with data. Technology is making this job easier and the tools became better, but what counts are the results. And those results can also be found at Strata.

For example, Chris Harland from Microsoft explained how he uses data to improve business at bars. He measures the behavior of guests at bars, and one of his stunning facts was that Corona beer is a good predictor for more spending. But dear bar owners, please do not force your customers to drink Corona… as that would be mixing cause and correlation.

Corona is a good predictor for spending behavior. Do not mix up cause and correlation.

Another fascinating (meaning action-oriented) talk was given by Drew Sullivan, on the Organized Crime and Corruption Reporting Project. He used data on money movement to show how to detect fraudulent activities in Montenegro.

Now since data is demystified, let’s apply it: turn our businesses around and become data-driven. This means that data scientists should learn more about business, and businesses should become more like data scientists. Perhaps a new role might be that of a business scientist?


But do not be fooled. Getting the question right is the hardest part. I describe this issue in depth in my book Ask-Measure-Learn (O’Reilly). Take Monica Rogati‘s (@mrogati) presentation as an example. She is the famous data scientist who demonstrated in a well-received talk that woman sleep 20 minutes longer than men, on average. Ok, but how does that help me? This insight is amusing at best. The real question is what to do with this kind of information. Knowing Monica, she has already a new product based on data in mind. Let’s see what she will have to say about it at the next Strata. I know I will be there.[/vc_column_text][/vc_column][/vc_row]