Many organizations start business analytics projects with high energy and hope. But very soon, the teams face many issues and then the blame game starts. Issues about data not being available, unclean data, interfaces to extract-transform-load not working, unsatisfactory visualization etc. keep surfacing on a daily basis. These are the resultants of improper planning and process.
No documentation about data
In Nov-Dec-2018, more than 200,000 cubic feet of water per second, was released to bay of Bengal, from the river Cauvery. How can the local bodies forget the basic of water storage practices? Had that been done, today you will not be seeing water tankers in every street of Chennai and a huge fighting crowd around that!
Control mechanisms for data cleanliness
When data transformation and loading are done in multiple stages, a lot of noise gets introduced. Since this is outside the OLTP systems, the onus is not on the application provider. The data analytics team has the responsibility to have proper checks on data cleanliness. Moreover, as and when the OLTP data formats change, these ETL (extract transform load) checks must also change to reflect those changes.
Security loopholes
OLTP systems implement the role-based security along the workflow. When all data come to DWH, if the same kind of security is not established, insiders may exploit this and get chance to view data that they are not supposed to access. This leads to damage of the image/trust and mostly go unnoticed.
Stale dashboards
When logic/data change, so do the dashboards. When new dashboards are introduced, it is equally important to remove the dashboards that are not relevant or obsolete. But this is not done regularly. Often people use the older dashboards (as they are used to it for a longer time) and keep questioning the accuracy either to the IT team or to the business team.
Logfile pileup
The analytics jobs also write a lot of log details. These need to be cleaned up in the same way that we do in OLTP systems. During extraction and transformation, many temporary extracts are also created that need proper cleanup. If not done, these log files may lead to performance issues due to lack of disk space.
There are so many other problems too - we will keep going thru those in subsequent posts.