Date Posted: 17th Dec 2009 at 11:32 AM
Hi All,
I've been doing a lot of work behind the scenes on the general coding of the site, and figuring out ways of improving the speed of day to day usage.
First off, how do we identify areas to speed? Well, I have a number of ways of doing that. Firstly, there is a script that I can run that tells me, realtime, which pages in general people are accessing, based on the session information. I get a report like this:
forumdisplay.php => 13
register.php => 14
wiki.php => 19
archive => 21
printthread.php => 25
showpost.php => 35
member.php => 57
index.php => 80
download.php (Type : Other download) => 87
download.php (Type 3: Mesh Recolour) => 88
download.php (Type 1: Maxis Recolour) => 116
showthread.php => 126
download.php (Type 2: New Mesh) => 154
download.php (Type 0: Other download) => 204
browse.php => 568
download.php total: => 656
Total: 1659
Here you can see the 2 main pages on the site are clear: download.php and browse.php. As you probably know, download.php is the pages that show you the individual downloads here on the site, and browse.php is the Download Browser that helps you find downloads.
So obviously I know that improving the speed and responsiveness of these 2 pages will nearly 2/3rds of people visiting and using the site.
What I have been working on lately is converting the debugging code I used to use into a combination Firebug/FirePHP system thats activated only for specific people. This allows me, when I browse the site, to see exactly what queries are being run, the memory usage, and so on.
Obviously to get to that point a lot of code has to be modified: the mysql layer has to be changed, and debugging code has to be placed inside the various scripts I am debugging. But the payoffs are worth it: granular debugging with SQL EXPLAINS, timings, and memory usage, for any page on the site.
Armed with this info I then basically do the following: Find a page on the site and reload it multiple times, comparing the information I see in debugging with whats on the page. This lets me see areas that I can optimise for information that doesn't change often.
So, for example, on the pages that show information about the download, we know that things like the EP icons, the meshes used, any recolours of the mesh, and other information like that doesn't actually change very often (unless the creator does it). So this is information we can easily cache.
What I then looked at is the information that gets taken from the database for every user on the site, for every page refresh - namely, that users information. Speeding up the session handling, and limiting the amount of information coming from the database means less queries, and less load on the database server. So I split off the language information to a seperate system, and merged it with the user information afterwards, rather than grabbing it all in one lump. I also cache the user information for 1 day or until it changes.
This is useful becuase the next optimisation is that of limiting how much user information and post information comes from the database. If we look at the example of a thread or download, we have 2 main things: The individual posts, and the users information about who made that post. In a lot of cases, when a user has made multiple posts to a thread (say, the creator comments on some feedback) we have multiple copies of the same information. In normal vBulletin installations, this information is retrieved for every post, even if it's the same as the last post by that user.
So, to speed this up I basically split the page up into 2 - grab the post details, and grab the user details. By grabbing the user details seperately we only get the details that are distinct on that page - 100 posts by the same user only results in 1 users information being retrieved, rather than the same information repeated 100 times. Obviously this is lighter on the database - especially when you use memcached to cache that users information across threads (think of a creator who posted multiple things).
All this results in a much faster and much less database intensive page load - for the average guest visiting a popular thread, we are only talking about 4 or 5 queries, at most.
For the browse pages, I added much more aggresive caching of the results you expect to get back from the database, but there is not nearly as much work done here are on the showthread pages - it's still an area of improvement.
Overall, using memcached and optimising how the information is used, and what gets pulled from the database is key to making the site faster. But how fast? Well, as an example, between December 2nd and December 3rd pageviews jumped from 555,998 to 1,140,027. Since then it's gone up to a minimum of 1.4 million, with max at 1.8. This kind of jump is pretty impressive, but it can be better.
At the end of the day though, a faster site is a better site, and I know there is more work to be done in this area. It is, however, an interesting one, and one that I don't expect many (if any) people reading this to understand. If you do, well, then at least I have
an audience, albeit small.