An access log is a log that is generated by the webserver which logs the details about the request that it processed, along with the status code and the response time it took to process. Similar log files can be found in the load balancers as well, which will have similar details stored. While doing any performance analysis, these logs play an important role and it is being neglected most of the time due to lack of awareness. Also, the usage of APM tools abstracts it from the users by allowing them to focus on visually coarse-grained data instead of fine-grained data. Most people are aware of application server logs but many of them are not aware of web server/load balancer access logs.
Access logs are available for the below servers but the format will be different for each of them
- Web servers like Apache, Nginx, etc.
- Load Balancers like AWS ELB
- API Gateways like AWS API Gateway, APIGEE, etc.
- Gateway & Proxy servers
In this blog, I am going to cover Apache access log format, AWS ELB access log format, a real performance issue identified using access log, and other problems that can be investigated using the access log.
Apache Access Log Format
Below is a sample entry taken from the Apache access log:
126.96.36.199 - test [22/Sep/2015:08:46:01 +0400] "GET /test HTTP/1.1" 404 288 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.99 Safari/537.36"
188.8.131.52 – IP address from where the request was made to this server.
test – It is the username of the user who is requesting the information from this server.
[22/Sep/2015:08:46:01 +0400] – Timestamp when the request was received.
"GET /test HTTP/1.1" – GET is the HTTP method used to request the object. /test is the resource requested. HTTP/1.1 is the protocol used.
404 – It is the response code returned to the client.
288 – It is the size of the object/resource returned by the server.
"-" – It is the referrer URL and it is absent in this request.
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.99 Safari/537.36" – It is the browser agent from which the request was made.
ELB Log Access Format
Here is a sample taken from an ELB log:
2019-11-30T15:00:33.293541Z www-test-com 184.108.40.206:46765 220.127.116.11:80 0.000064 0.64636 0.000047 200 200 0 557178 "GET https://www.test.com:443/ HTTP/1.1" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
2019-11-30T15:00:33.293541Z – Timestamp when the load balancer generated the response to the client.
www-test-com – Resource ID of the ELB used.
18.104.22.168:46765 – IP address and Port of the client.
22.214.171.124:80 – IP address and port of the target server where the request is processed.
0.000064 – The elapsed time from the time the load balancer received the request until the time it sends the request to a target server.
0.64636 – The elapsed time from the time load balancer sent the request to the target server until the target server starts sending the response headers.
0.000047 – The elapsed time from when the load balancer received the response from the target server until it started sending the response to the client.
200 – Response code sent by the load balancer.
200 – Response code received from the target server.
0 – Size of the bytes received from the client.
557178 – Size of the bytes sent as a response to the client.
"GET https://www.test.com:443/ HTTP/1.1" – It is the request details which is similar to Apache Server log.
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" – It contains the user agent details similar to Apache server log.
ECDHE-RSA-AES128-GCM-SHA256 – It is the SSL cipher suite used in this request.
TLSv1.2 – It is the SSL protocol version used.
For most of the layers in a multi-tiered application, the entry point would be the web server or load balancer. Hence it is important to analyze these log files to check whether the request reached the server or not. If there is no entry then it clearly indicates the request has not reached from the previous layer to this current layer. Then we need to look at the previous layer or the communication between the previous layer and this layer.
In one of the production incidents where the CDN Server was reporting that they received the response code as 0 from the AWS ELB layer. On analyzing the ELB logs it was found that there was no response returned with 0 as the status code. Since the traffic was huge on the application at that particular point in time, the requests were getting timed out at the load balancer itself without waiting for the response from the target server. But this has nothing to do with the 0 response code as the AWS ELB was returning 504 as the status code. Below is the line which shows the entry made during the problem period:
2019-12-11T12:58:32.768181Z www-test-com 126.96.36.199:37280 - -1 -1 -1 504 0 0 0 "GET https://www.test.com:443/ HTTP/1.1" "Apache-HttpClient/2.5.5 (Java/1.7.0_122)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
In the above line,
- ELB returned status code 504
- -1 observed in the elapsed time-related fields as ELB did not receive any response within the time out configured
- 0 for the response code from the target as well as 0 bytes for the response as there is no response received from the target
Later it was found that there was some network issue between the CDN to the AWS ELB layer which resulted in 0 response code returned to the CDN server. This kind of analysis is possible only when we analyze the access log otherwise it can’t be done with any APM tool as it may not be able to show the data to this granular level in a straightforward manner.
Problems that can be diagnosed using access logs:
- HTTP Status code 404 – It can be analyzed and fixed by having the resource in web server or by deleting the reference to the web resource from the web page. This helps in saving the roundtrips made to the server.
- HTTP Status code 500, 502, 503, 504 – These are status code received from the target servers which can be further investigated for issues.
- The user-agent string helps in analyzing the types of devices/browsers from where the request is made. Usually, this helps to identify whether it is a bot attack or not. Some bot attacks will have the same user agent repeated which is not possible by any specific user to hit the server at that frequency.
- SSL cipher & SSL protocol versions can be used to analyze if there are any problems with SSL connectivity.
- Request time out or Gateway timeout kind of issues can be investigated.
Access logs play an important role in tracking the request passing through the layers and seeing where it has not reached. It should not be neglected while investigating any performance issues related to the network and HTTP status/error codes. It gives more meaningful information which will be helpful for different analyses.