Tag Archives: HTTP

Simple HTTP profiling of applications using sysprof

This is a quick write-up of a feature I added last year to libsoup and sysprof which exposes basic information about HTTP/HTTPS requests to sysprof, so they can be visualised in GNOME Builder.

Prerequisites

  • libsoup compiled with sysprof support (-Dsysprof=enabled), which should be the default if libsysprof-capture is available on your system.
  • Your application (and ideally its dependencies) uses libsoup for its HTTP requests; this won’t work with other network libraries.

Instructions

  • Run your application in Builder under sysprof, or under sysprof on the CLI (sysprof-cli -- your-command) then open the resulting capture file in Builder.
  • Ensure the ‘Timings’ row is visible in the ‘Instruments’ list on the left, and that the libsoup row is enabled beneath that.

Results

You should then be able to see a line in the ‘libsoup’ row for each HTTP/HTTPS request your application made. The line indicates the start time and duration of the request (from when the first byte of the request was sent to when the last byte of the response was received).

The details of the event contain the URI which was requested, whether the transaction was HTTP or HTTPS, the number of bytes sent and received, the Last-Modified header, If-Modified-Since header, If-None-Match header and the ETag header (subject to a pending fix).

What’s that useful for?

  • Seeing what requests your application is making, across all its libraries and dependencies — often it’s more than you think, and some of them can easily be optimised out. A request which is not noticeable to you on a fast internet connection will be noticeable to someone on a slower connection with higher request latencies.
  • Quickly checking that all requests are HTTPS rather than HTTP.
  • Quickly checking that all requests from the application set appropriate caching headers (If-Modified-Since, If-None-Match) and that all responses from the server do too (Last-Modified, ETag) — if a HTTP request can result in a cache hit, that’s potentially a significant bandwidth saving for the client, and an even bigger one for the server (if it’s seeing the same request from multiple clients).
  • Seeing a high-level overview of what bandwidth your application is using, and which HTTP requests are contributing most to that.
  • Seeing how it all ties in with other resource usage in your application, courtesy of sysprof.

Yes that seems useful

There’s plenty of scope for building this out into a more fully-featured way of inspecting HTTP requests using sysprof. By doing it from inside the process, using sysprof – rather than from outside, using Wireshark – this allows for visibility into TLS-encrypted conversations.

GET and POST

One of the most basic features of a website is a form. You can use them to send data to a website, search for things, or manipulate the URL. Many less experienced web developers will have heard of GET and POST requests, but what are they really, and what are the differences between them?

To explain them, let's go back to fundamentals. Every time you get a web page from a server, your browser sends an HTTP request to the server, and gets a response. A typical HTTP request to retrieve this site is as follows:

GET /index.php?media=rss HTTP/1.1rn
Host: tecnocode.co.ukrn
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.8.0.5) Gecko/20060731 Ubuntu/dapper-security Firefox/1.5.0.5rn
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5rn
Accept-Language: en-gb,en-us;q=0.7,en;q=0.3rn
Accept-Encoding: gzip,deflatern
Accept-Charset: UTF-8,*rn
Keep-Alive: 300rn
Connection: keep-alivern
Cookie: foo=bar; foo2=bar
rn

I'm not going to explain it all, but basically, it's asking the server for the /index.php?media=rss page on tecnocode.co.uk (second line). All the other lines are there to detail what can and can't be accepted, and how the connection is going to be handled. The whole thing is terminated with a line containing only rn (the UNIX carriage return and newline escape sequences).

It's the first line we're interested in here, as that is the one detailing the fact that this is a GET request. As you can see, GET requests are used to retrieve most pages you download off the web, but you might not realise that they can be used in forms as well.

POST requests are different to GETs, as instead of encoding parameters from the form in the URL, it encodes them in a different part of the request.

POST /index.php?page=login&paragraph=login HTTP/1.1rn
Host: tecnocode.co.ukrn
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.8.0.5) Gecko/20060731 Ubuntu/dapper-security Firefox/1.5.0.5rn
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5rn
Accept-Language: en-gb,en-us;q=0.7,en;q=0.3rn
Accept-Encoding: gzip,deflatern
Accept-Charset: UTF-8,*rn
Keep-Alive: 300rn
Connection: keep-alivern
Cookie: foo=bar; foo2=bar
Content-Type: application/x-www-form-urlencodedrn
Content-Length: 28
rn
username=DrBob&password=test

You can see here that this request is basically the same as the GET request above, apart from a few minor things. Firstly, the first line says POST instead of GET, and secondly, there are some extra lines at the bottom. Content-Type and Content-Length tell the server the type (encoding) and length of the form data, respectively, then the form data itself is sent; in this case, my username "DrBob", and a fictitious password "test". It's because of this separation of the form data from the URL that POST pages can't be referenced by URL, as a GET request would miss out the POSTed form data.

The next step in understanding, is how to use both GET and POST requests in forms. You can already make a GET request by using a hyperlink, but that doesn't enable you to query the user for their input to the URL's parameters.

<form action="http://tecnocode.co.uk/" method="get">
	<fieldset>
		<legend>Search terms</legend>
		<label for="query">Enter your search terms. You can "-exclude" keywords.</label>
		<input type="text" name="query" id="query" value="" />
		<input type="hidden" name="page" value="search" />
	</fieldset>
	<fieldset>
		<input type="submit" value="Search" />
	</fieldset>
</form>

Here we have a simple search form, which uses GET. The main difference between this and a POST form is that the method for a GET form is "get". However, there is another difference, and that's that if you try to put parameters on the URL in the action attribute, they will be ignored. With GET forms, all parameters must be done as <input /> fields; that means moving any action parameters to hidden inputs, as is shown in the example. If that example was used with a query of "moo", the URL "http://tecnocode.co.uk/?page=search&query=moo" would be returned.

<form action="http://tecnocode.co.uk/?page=login&paragraph=login" method="post">
	<fieldset>
		<legend>Username</legend>
		<label for="username">Your unique username.</label>
		<input name="username" id="username" type="text" value="" />
	</fieldset>
	<fieldset>
		<legend>Password</legend>
		<label for="password">Your personal password.</label>
		<input name="password" id="password" type="password" value="" />
	</fieldset>
	<fieldset class="submit">
		<input type="submit" value="Login" />
	</fieldset>
</form>

This form makes a POST request with login details, because its method attribute is "post". With POST forms, parameters can be put into the action URL, because the form data itself will be submitted separately from them. When used with the username "DrBob", and the password "test", this form will generate the POST request used as an example further up. Note that no inputs are ever securely encoded: the password field is transmitted in plain-text, because this form only operates over HTTP, as opposed to HTTPS; SSL certificates (which are required for HTTPS to work) cost lots of money.

So when should you use POST, and when should you use GET? Well, the best way to remember, is that POST requests should change the state of the server. By that, it is meant that they should trigger some action which will result in future page requests returning pages which are different to those returned before the POST. A good example of this would be to add a news item to a site. GET requests are used for every hyperlink in a site, but they shouldn't be limited to those. For example, if you have a search function on your website, you will most likely want a search form to feed it. However, this form should not use POST! It does not change the state of the server, and thus should make a GET request. Another effect of using GET requests for tasks such as these is that the user's browser doesn't prompt them if they want to re-submit the data when they refresh, and they can bookmark/share the URL of the returned page without it appearing differently for other people.

Some common situations for using GET forms follow:

  • Search
  • Selecting something to view/edit/delete/etc. out of a list
  • Navigating pages (pagination)
  • Selecting the page's stylesheet/view mode (e.g. switching to debug mode on a large web application)
  • Selecting a file mirror for a download

Although it's probably obvious, a list of common POST form usages follows:

  • Adding/Editing/Deleting an item
  • Logging into a site
  • Logging out of a site (yes, this changes the site's state)
  • Submitting a comment or trackback

Just remember: POST changes the server's state. ;)