The APM industry can only grow - thanks to mobile traffic and IoT traffic. Customers ruthlessly dump slow applications. Hence companies must invest in APMs to ensure the speed of apps is not compromised. Unless you monitor continuously, you cannot be prepared when traffic surge strikes.
If we dissect the layers in applications, typically we see UI layer (jsp, aspx etc.), business logic layer (java or c# classes) and data layer (sql queries and stored procedures). Network connects all these layers across geography. Most of the APMs work on certain set of core techniques. One such technique is to get deep-dive insights about the time taken by every method/function in java or .net or php, at the server side. This is used to find out which class/method is taking time at the server side. A set of runtime code is automatically injected to the binaries and it takes the entry-exit timings of every method.
Now let me ask this hard question. Is this byte code injection (for profiling) absolutely necessary for APMs? Can we find out who slows down, without this profiling technique? Profiling/sampling takes certain amount of overheads to collect these method level execution time statistics, hence can I avoid this? But at the same time, I want to know who actually slows down the app.
If you are familiar with Synthetic user and Real user monitoring techniques, we can find out if the UI layer takes time. In fact the synthetic user monitoring can give a clear waterfall chart that gives page/component-wise timing details. This solves one part of the problem.
If we use database monitors with the ability to collect slow queries, that will tell you which query is executed, how many times and when, with clear response time details. Thus we can isolate the slow queries and stored procedures. This solves the back end problem.
So we have solution to identify+isolate the slowness in the extreme front end and the extreme backend. How to get the details about the middle layer written in Java or c#? If we use access logs of MS IIS or Tomcat or JBOSS, we can find out how long the url call or a call to a web service took to execute. This execution time includes the time to execute sql query as well as the time to do business logic in Java/C#.
Honestly, today, most of the applications do not write a whole lot of business logic in this middle layer. Rather, it acts as a postman between the UI and database procedures.
When there is not much logic (such as trajectory calculations or match-making algorithm or least path algorithm etc.), what is the point in getting the timing details of every method and that too for every call to that method?Why to sacrifice 4-6% of resource overhead due to this profiling of JVM/CLR?
Why to pay $200 per month per profiling APM, when we already know that the JVM/CLR layer is not having a whole lot of time consuming logic in it?
Profiling/Byte code Injection is an intrusive process.
Net-Net, with non-intrusive methods, when we can identify and isolate the performance bottlenecks, why must we try to adopt costly and intrusive profiling APM tools? That too in production?!