Tuesday, 15 January 2013

Performance Analysis - Part 1: Understanding interactions inside IT environments


My first post was about the general process around doing performance analysis in a scientific fashion.

Now I'm going to dive into a process I use to understand large, interconnected IT systems. Having a good mental model of how a system interacts with it's components is essential. It's very difficult to form useful hypotheses about problems, if you don't have an idea of the data flows and connection interactions involved.

Bear with me, this is a long one, but fear not, there are diagrams!

Needless to say, experience with the software and hardware you're investigating is pretty essential. It's difficult to know what "normal" looks like, if you haven't seen it before!

I always start these processes with a diagram - even if it's in my head, or on a whiteboard.

Now for the diagrams!




Below is a hybrid physical/logical diagram of a very simple LAMP system.



















Here I'm adding some simple information about system specifications.
 nothing too technical ;-)










I'm now going to add TCP connection information to the diagram.

  • The Clients TCP connection terminates on the loadbalancer
  • The loadbalancer talks to the PHP servers over a seperate TCP connection.
  • The PHP servers talk TCP to MySQL.


This is important, because it marks the boundaries between potentially independent moving parts of the system, as well as reminding us that there's possibly some potential in optimising connection overheads, and the network stack.

This diagram assumes that there's either no NAT/firewall, or that the loadbalancer is doing it itself. If we ran an independent firewall, or had Layer 3/4 loadbalancing, the TCP connection paths would look a little different.




Here's some information about the thread pools available on different parts of the system.

This shows a reasonably well matched system in thread-pool terms (for an arbitrary web site workload).

I have in the past encountered some very mismatched configurations, but we'll talk about the effects of getting thread pools wrong in another blog post.










And lastly, any application specific information that might be relevant.

This may come from knowledge of the business, talking to Developers, as well as direct knowledge of important settings in the applications and infrastructure used.











Still here?

At this point, you should be able to take an imaginary requests from a client, and trace the interactions all the way through the system. Bear in mind that your knowledge of this system is far from complete yet, and parts of it may be wrong! This is just a starting point.

Next time I'll walk you through choosing a metric to use as a basis for a performance test. Check back on http://www.jmips.co.uk/blog soon.


No comments:

Post a Comment