Business Centric “Big Data” Solutions

More and more corporations are utilizing an avant-garde strategy that redefines how they process information in a radical way, whether the data is structured, unstructured or streaming. It’s called the “Big Data” solution and not only is it a cost-effective technology but it’s also a business strategy that capitalizes on information resources. This technology makes it possible to manage and analyze all available data in its native form, so almost every business across the spectrum could benefit tremendously from the data they already have. Computers This data that businesses already have is referred to as “found data,” which for example can be the digital exhaust from web searches, mobile pings, credit card payments, social media, amongst other things. The data is then all aggregated together as useful information for better decision making by management, recognizing waste, or even knowing when customers are most likely to respond to a promotion. A Big Data solution can also identify what type of promotion it should be and the list of possibilities goes on. Two stages in big data history 1. Informative, static:  “Turn data into information”: OLAP, visualization. 2. Forward looking, Dynamic: “Turn data into forecast.” The I Know First self-learning algorithm can take the messy collage of data points collected for disparate purposes, and can not only help assimilate the data, but it also can take it one step further by making reliable daily algorithmic forecasts based on predictive analytics. For instance, the predictions system can help with campaign lead generation by finding the leads that will most likely result in incremental telecom sales. Big Data Solutions have shown to be effective. In fact, research has indicated that companies leveraging Big Data will financially outperform their peers by 20% or more. Challenges of “Big Data” These new analytic capabilities that provide proprietary insights into consumer trends is not an easy task. For one thing, there is a tremendous amount of data that must be analyzed and this is also presents a physical challenge for storage of all that data, which is rapidly increasing. Consequently, this adds another challenge because since there is so much data being added at such a tremendous rate, some of the older data becomes irrelevant. The advanced algorithms must be able to filter the irrelevant data with aggressive filtering and compression. Utilizing a Big Data solution is much more effective than single-stream analysis as it is able to recognize and combine all of the information in the multi-channeled world that people actually live in and makes these insights actionable. Consumers see TV spots, comments on social media, see search marketing ads, receive newsletters, are targeted in marketing campaigns. There are other components that are imperative to consider such as geographic location, demographic factors or social sentiment. The goal of Big Data is to take these disparate streams of information and provide insight into the performance of historical patterns, make abstract connections not other wise seen, and give direction to the company based on the numbers. The I Know First Customized Business Centric “Big Data” Solution I Know First has its place in the “big data” ecosystem due to its strength in the time series analysis algorithm originally developed for predicting the financial markets. The algorithm is based on Artificial Intelligence, Machine Learning, and incorporates elements of Artificial Neural Networks and Genetic Algorithms. Its use of distributed parallel processing and modular software allows flexibility and rapid scaling to accommodate more inputs. At the same time it allows compression of raw data to a manageable size through eliminating non-contributing inputs. The adaptability allows to quickly learn new patterns. How do we deal with the large volume of data? – In a tree-like hierarchical fashion: 1. Break the data in smaller chunks – branches, according to main logical identifiers, then break it further in smaller branches, until it becomes of manageable size. 2. Analyze the data in each branch using our time series algorithm. 3. Reject unimportant variables. This further reduces the size of the data. This process is called PCA, the Principal Component Analysis. 4. Re-assemble the smaller branches into the parent larger branches. 5. In a repetitive way, go back to steps 2 to 4, with each round reducing the data size. 6. Finally reassemble the now clean and much reduced in size data into the main trunk. All this is done in automatic iterative way, using massively paralleled distributed processing. Sub-steps: Data Preparation: Filtering and Cleaning Flag and remove obvious errors and missing values. Number of cases with unreasonable value should be small. Alert data provider of errors. Identify outliers: Are they valid or errors? Random sampling In some cases we don’t know the logical structure of the data, such as when instead of a tree we see bushes with apparently non-related data. When we don’t know how to break the data logically, randomly chosen subsets of the data can be processed through steps 2 to 5. Eventually the logical pattern will be emerging, with same inputs repeatedly showing up as important in different subsets of the data. Clustering and Classification In processing the data we’ll find that some inputs are highly correlated. By clustering them into groups according to similarity, then combining in classes, or removing the redundant ones we can further reduce the data. We are also developing more sophisticated proprietary data reduction techniques with the large goal to reduce the data, leaving only the most relevant inputs that affect the forecast. Seasonality One factor to consider when analyzing time series data is the seasonality. Our algorithms are also looking for the cycles in data, which are normally hidden underneath more important events, measuring their periodicity and relative importance. Learning is part of the reduction process Understanding, Modeling and Evaluation.  All this is a continuous and tedious iterative process that combines data reduction with learning the relationships between the data inputs. And the ultimate goal is the forecast creation using the combination of the reduced data and the knowledge of relationships that was gathered in process of reducing the data. How do we deal with the fast changing data? The relationship between the various data inputs change with time. This is true in financial world. While the basic laws of supply and demand, the strive for a higher yield with lower risk do not change, the more subtle econometric relationships can and do change. This also applies to today’s fast moving multi-channel world. The key to this problem is adaptability. We normally want to have as large data sets as our computational resources permit, but we should also realize that old data may not be relevant today. It’s a precious balance, generality vs. recent performance. Thus the successful forecasting system should constantly monitor its performance and learn to adapt to the new reality. The algorithm basics As the algorithm is entirely empirical, any type of data can be fed to the prediction system and the algorithm will be able to provide a forecast for several time horizons. The predictive algorithm is self-learning, adaptable, scalable, and features a Decision Support System (DSS). Figure 1 demonstrates the typical steps taken by companies utilizing the actionable insights given by Big Data analytics. Big Data Steps The I Know First algorithm is based solely on data and not on any human derived assumptions. The human factor is only involved in building the mathematical framework and initially presenting to the system the “starting set” of inputs and outputs using randomized sampling or logical selection.  From that point onwards the computer algorithm takes over; it constantly proposes “theories” and tests them automatically on a “training” set of data, then validates them on the most recent data, which prevents over-fitting. Some inputs are being rejected, meaning they don’t improve the model. Then another input is substituted. The resulting formula is constantly evolving, as new daily data is added and as a better machine-proposed “theory” is found. The algorithm being entirely empirical is non-parametric, it does not assume any statistical distribution, but measures the distribution of each input as part of the process. The following picture displays a synopsis of how the algorithm works using our financial forecasting example. Prediction-cycle The I Know First Customized Business Centric “Big Data” Solution can be used for these suggestions but is not limited to just these solutions.

  • Compliance and regulatory reporting
  • Risk analysis and management
  • Fraud detection and security analytics
  • CRM and customer loyalty programs
  • Credit risk, scoring and analysis
  • High speed Arbitrage trading
  • Trade surveillance
  • Abnormal trading pattern analysis
  • Demand for goods and services
  • Understanding and monetizing customer behavior
  • Discount & advertising targeting – “next best offer”
  • Call center utilization

Contact us about how our self-learning algorithm can enable “data monetization” through timelier, more accurate, more complete, more granular and more frequent decisions with the I Know First Customized Business-Centric Big Data Solutions for your business. Email: [email protected] Business disclosure: I Know First Research is the analytic branch of I Know First, a financial services firm that specializes in quantitatively predicting the stock market. Joshua Martin, an I Know First Research analyst & Founder Dr. Lipa Roitman, wrote this article. We did not receive compensation for this article and we have no business relationship with any company whose stock is mentioned in this article.