Open Query is now three years old! We initially started with consulting and training services, and extended this with our proactive subscriptions that also offers system administration and monitoring.
So how is it going? Pretty well. We’ve been profitable from the start, without funding (beyond a few hundred $ startup costs paid by Arjen) or any credit – by choice. Our objective has never been to grow ridiculously in terms of revenue or number of customers, we simply charge reasonable prices for real service. Right now we have dozens of clients on an ongoing basis, a neat trickle of new clients, and Open Query sustains the livelyhood and lifestyle of a number of people.
For me (Arjen), the three year mark is particularly interesting, since most startups do not make it past their first two years. With our different approach to doing business, we’ve seen our fair share of skepticism. Not that we mind, if anything it’s encouragement
If you’d like to learn more about our business principles, see the Upstarta site.
This weekend, I decided to attend BarcampJB pretty last minute. Lucky for me, barcamps are made for chaotics like me, so it was no problem at all. I found some friends that live here in Kuala Lumpur who I drove down to JB with (JB is around a 5 hour drive from KL, we did it in 3.5 ).
The camp was very interesting. Because JB is on the border with Singapore, there’s a good crossover between Malaysian and Singaporean techies.
I decided to go all out and give three talks on Saturday: First up was the MMM talk I’ve given at a few conferences before. All went well, and later on in the day some people approached me for more in-depth questions. It still seems that people have this idea in their head that they somehow need MySQL Cluster when there is more then one machine involved. When I explain them that that is very rarely the case and they can achieve what they want with MMM as well, they are often happy to hear that.
My next talk was more of a personal development one. People keep asking me here where I am from. When I explain to them that I’ve been location independent for the last 3 years, they are usually very eager to find out how I pull that off. I decided to summarise my experiences and put them in a talk. This talk was very well attended and I loved giving it. Most of the attendants were young techies, they are usually in a perfect position to do something very similar to what I’m doing.
The last talk was a lightning talk on Zabbix, the Open Source monitoring system we use at Open Query. Quick, and dirty, but effective.
Other interesting talks I attended were on breeze, an online banking application made for Standard Chartered bank that looks very slick and usable (If anyone from my bank is reading this: get with the program and fix our banking application to enter the 21st century please ).
Conary and Foresight Linux were interesting as well. Conary (the package management system in Foresight Linux) is not quite mature yet, but definitely a very interesting technology. I was interested to hear about it and hope to see it become more mainstream in the future.
Daniel Cerventus gave a good lightning talk on what not to do as a startup. The main message was to just do it, and not wait for grant money or VC’s. Some solid tips as well, one of them being to run your potential name through Namechk, a handy potential username checker for many services.
There was obviously also a lot of networking and we went for a foot massage at the end of the day. Funny fact: I was the only one to stay awake through the massage (Even though I am narcoleptic), while two of my friends (who I won’t name here ) snored all the way through it
All in all another succesful tech event in Malaysia. Definitely one of the many reasons I love living here!
Open Query now has its own @openquery account on Twitter and Identi.ca so you can conveniently follow us there for announcements and tips – and also ask us questions! All OQ engineers can post/reply. The OQ site front page also tracks this feed.
Previously I was posting from my personal @arjenlentz account with #openquery hashtag, but that’s obviously less practical.
With the exhaustion of IPv4 address space looming sometime in 2012; probably earlier rather than later, it makes sense to ease on into IPv6 land. Without straying into tunnel broking and endpoint shenanigans 6to4 is a method of wrapping up IPv6 inside of IPv4.
6to4 performs three functions:
Allocates an IPv6 address block to any host/network that has a global IPv4 address.
Wraps up IPv6 packets inside IPv4 packets for transmission over IPv4 using 6in4 (traffic is sent over IPv4 inside IPv4 packets whose IP headers have the IP protocol number set to 41; IPv6-in-IPv4. ) 6to4 makes use of IP protocol 41 too, but instead of static endpoints, the endpoint IPv4 address is sourced from IPv6 addresses within the IPv6 packet header.
Routes traffic between 6to4 and “native” IPv6 networks.
As such its pretty easy to implement, especially on our good friend Debian (and its better looking cousin Ubuntu).
I am going to step through setting up a Debian host at Linode.
Step 1 Check your Kernel
Now, the first caveat is that you must be running a 2.6.20+ kernel (At the time of writing the latest linode kernel for Debian was : 2.6 Paravirt (2.6.34-x86_64-linode)). The default ‘Etch’ release kernel (2.6.18) supports IPv6 but woefully implements IPv6 stateful connection tracking, which is just not good enough for a decent firewall. If you have a look under your Linode Configuration Profile you can see what Kernel you are running, and change it to one that is supported; obviously a reboot would be in order if you change it. The linode kernels have IPV6 support compiled in.
But here is the quick way to check whether IPV6 is compiled in, if the following fails IPv6 is either not compiled in or the module has not been loaded:
#!/bin/bash
/sbin/modprobe.real "$@"
ret=$?
if [ "$1" == "-Q" ] ; then
exit 0
fi
exit $ret
Step 2 Calculate your new IPv6 address
Any IPv6 address that begins with the 2002::/16 prefix is known as a 6to4 address, as opposed to a native IPv6 address which does not use that prefix. The Internet Assigned Numbers Authority (IANA: www.iana.org) has set aside this address space just for 6to4. IPv6 addresses are assigned based upon your IPv4 address; for instance, 74.207.254.16 would become 2002:4acf:fe10::/48
We need some tools to help us calculate our IPV6 address, luckily there is a package for this
$ sudo apt-get install ipv6calc
Now its a matter of plugging in your IPv4 address into ipv6calc to determine your reserved IPv6 address range.
You get given an address range with a prefix length of 48 bits, which leaves room for a 16-bit subnet field and a 64 bit host address within the subnet.
Step 3 Update your interface configuration
You now need to edit your network configuration file /etc/network/interfaces file
auto tun6to4
iface tun6to4 inet6 v4tunnel
address 2002:4acf:fe10::1
netmask 16
gateway ::192.88.99.1
endpoint any
local 74.207.254.16 #fits address
auto tun6to4 # make sure this interface comes up on boot
iface tun6to4 inet6 v4tunnel
address 2002:4acf:fe10::1 #first host in this address range
netmask 16
gateway ::192.88.99.1 #special anycast address for 6to4 (2002:c058:6301::)
endpoint any
local 74.207.254.16
mtu 1472 #The MTU is therefore the normal Ethernet MTU (1500) minus the headers used on the tunnel.
ttl 255
Restart your interfaces (not recommended):
$sudo /etc/init.d/networking restart
If you want to be a little but more careful and not wipe out all networking if something goes wrong (eg you are using Ubuntu or IPv6 is not available), you could just bring up the new interface:
$sudo ifup tun6to4
Step 4 Update IPv6 Firewall script/rules
Now it’s fairly important (read as critical) to firewall IPv6 stuff as it is with IPv4. Here is a small sample of a firewall that will at the very least not leave you hanging in the breeze. Needless to say you can add your own rules and make this as complex as you need.
# Initialize all the chains by removing all the rules
iptables --flush
iptables -t nat --flush
iptables -t mangle --flush
ip6tables --flush
ip6tables -t mangle --flush
# The loopback interface should accept all traffic
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT
ip6tables -A INPUT -i lo -j ACCEPT
ip6tables -A OUTPUT -o lo -j ACCEPT
#Allow IPV6 packets to come over the tunnel
iptables -A INPUT -p ipv6 -i eth0 -j ACCEPT
iptables -A OUTPUT -p ipv6 -o eth0 -j ACCEPT
# Allow outbound DNS queries from the FW and the replies too
iptables -A INPUT -j ACCEPT -m state --state ESTABLISHED,RELATED -i eth0 -p tcp
ip6tables -A INPUT -j ACCEPT -m state --state ESTABLISHED,RELATED -i tun6to4 -p tcp
# The policy should be to drop it
iptables -A INPUT -j DROP
iptables -A OUTPUT -j DROP
iptables -A FORWARD -j DROP
ip6tables -A INPUT -j DROP
ip6tables -A OUTPUT -j DROP
ip6tables -A FORWARD -j DROP
I usually create a directory called /etc/iptables (owner root:root / permissions 750) and drop firewall up and down scripts in there.
Then it is a simple matter of adding the following scripts to the bottom of your eth0 interface definition stanza in /etc/network/interfaces to invoke them on boot or whenever:
pre-up /etc/iptables/firewall_up.sh
post-down /etc/iptables/firewall_down.sh
pre-up /etc/iptables/firewall_up.sh
post-down /etc/iptables/firewall_down.sh
IMPORTANT: Just a quick note don’t block icmpv6 because it is the glue that holds IPv6 together.
Step 5 Setup Forward DNS
I am not going to over explain this one because everyone has an opinion on how to setup DNS but in essence you need to add a line like this to your zone file. There are plenty of articles outlining this stuff.
hyosine AAAA 2002:4acf:fe10::1
Step 6 Setup Reverse DNS
You now need to setup reverse DNS for your address, so using our example of 2002:4acf:fe10 you will have to configure the zone of ”0.0.0.0.0.1.e.f.f.c.a.4.2.0.0.2.ip6.arpa” in your name servers. The zone should have PTR records for your hosts just like an in-addr.arpa zone for IPv4, but with hex digits of the IPv6 address backwards, separated by dots. Using our example, the 6to4 host will have a ::1 suffix, so a reverse DNS record looks like:
PING hyosine.cloudcaster.com(2002:4acf:fe10::1) 56 data bytes
64 bytes from 2002:4acf:fe10::1: icmp_seq=1 ttl=60 time=1.41 ms
64 bytes from 2002:4acf:fe10::1: icmp_seq=2 ttl=60 time=1.34 ms
$ ping6 hyosine.openquery.com
PING hyosine.openquery.com(2002:4acf:fe10::1) 56 data bytes
64 bytes from 2002:4acf:fe10::1: icmp_seq=1 ttl=60 time=1.41 ms
64 bytes from 2002:4acf:fe10::1: icmp_seq=2 ttl=60 time=1.34 ms
Lastly, you need to register this zone and its servers with the 6to4 reverse zone authority. Note that when you visit that site, you’ll get an SSL certificate warning. This is normal. You need to visit this site using IPv6 from the actual 6to4 zone you’re trying to register. Follow the form to set up the nameservers for the zone and that’s it!
Some time ago I attended the “Optimisation by Design” course from Open Query¹. In it, Arjen teaches how writing better queries and schemas can make your database access much faster (and more reliable). One such way of optimising things is by adding appropriate query hints or flags. These hints are magic strings that control how a server executes a query or how it returns results.
An example of such a hint is SQL_CALC_FOUND_ROWS. You use it in a select query with a LIMIT clause. It instructs the server to select a limited numbers of rows, but also to calculate the total number of rows that would have been returned without the limit clause in place. That total number of rows is stored in a session variable, which can be retrieved via SELECT FOUND_ROWS(); That simply reads the variable and clears it on the server, it doesn’t actually have to look at any table or index data, so it’s very fast.
This is useful when queries are used to generate pages of data where a user can click a specific page number or click previous/next page. In this case you need the total number of rows to determine how many pages you need to generate links for.
The traditional way is to first run a SELECT COUNT(*) query and then select the rows you want, with LIMIT. If you don’t use a WHERE clause in your query, this can be pretty fast on MyISAM, as it has a magic variable that contains the number of rows in a table. On InnoDB however, which is my storage engine of choice, there is no such variable and consequently it’s not pretty fast.
Paging Drupal
At DrupalConSF earlier this year I’d floated the idea of making Drupal 7 use SQL_CALC_FOUND_ROWS in its pager queries. These are queries generated specifically to display paginated lists of content and the API to do this is pretty straightforward. To do it I needed to add query hint support to the MySQL driver. When it turned out that PostgreSQL and Oracle also support query hints though, the aim became adding hint support for all database drivers.
That’s now done, though only the patch only implements hints on the pager under MySQL at the moment.
One issue keeps cropping up though, a blog by Alexey Kovyrin in 2007 that states SELECT COUNT(*) is faster than using SQL_CALC_FOUND_ROWS. It’s all very well to not have a patch accepted if that statement is correct, but in my experience that is in fact not the case. In my experience the stats are in fact the other way around, SQL_CALC_FOUND_ROWS is nearly always faster than SELECT COUNT(*).
To back up my claims I thought I should run some benchmarks.
I picked the Drupal pager query that lists content (nodes) on the content administration page. It selects node IDs from the node table with a WHERE clause which filters by the content language. Or, in plain SQL, what currently happens is:
SELECT COUNT(*) FROM node WHERE language = 'und';
SELECT nid FROM node WHERE language = 'und' LIMIT 0,50;
and what I’d like to happen is:
SELECT SQL_CALC_FOUND_ROWS nid FROM node WHERE language = 'und' LIMIT 0,50;
SELECT FOUND_ROWS();
Methodology
I ran two sets of tests. One on a node table with 5,000 rows and one with 200,000 rows. For each of these table sizes I ran a pager with 10, 20, 50, 100 and 200 loops, each time increasing the offset by 50; effectively paging through the table. I ran all these using both MyISAM and InnoDB as the storage engine for the node table and I ran them on two machines. One was my desktop, a dual core Athlon X2 5600 with 4Gb of RAM and the other is a single core Xen virtual machine with 512Mb of RAM.
I was hoping to also run tests with 10,000,000 rows, but the virtual machine did not complete any of the queries. So I ran these on my desktop machine only. Again for 10, 20, 50, 100 and 200 queries per run. First with an offset of 50, then with an offset of 10,000. I restarted the MySQL server between each run. To discount query cache advantages, I ran all tests with the query cache disabled. The script I used is attached at the bottom of this post. The calculated times do include the latency of client/server communication, though all tests ran via the local socket connection.
My desktop runs an OurDelta mysql .5.0.87 (the -d10-ourdelta-sail66) to be exact. The virtual machine runs 5.0.87 (-d10-ourdelta65). Before you complain that not running a vanilla MySQL invalidates the results, I run these because I am able to tweak InnoDB a bit more, so the I/O write load on the virtual machine is somewhat reduced compared to the vanilla MySQL.
Results
The graphs show that using SQL_CALC_FOUND_ROWS is virtually always faster than running two queries that each need to look at actual data. Even when using MyISAM. As the database gets bigger, the speed advantage of SQL_CALC_FOUND_ROWS increases. At the 10,000,000 row mark, it’s consistently about twice as fast.
Also interesting is that InnoDB seems significantly slower than MyISAM on the shorter runs. I say seems, because (especially with the 10,000,000 row table) the delay is caused by InnoDB first loading the table from disk into its buffer pool. In the spreadsheet you can see the first query takes up to 40 seconds, whilst subsequent ones are much faster. The MyISAM data is still in the OS file cache, so it doesn’t have that delay on the first query. Because I use innodb_flush_method=O_DIRECT, the InnoDB data is not kept in the OS file cache.
Conclusion
So, it’s official. COUNT(*) is dead, long live SQL_CALC_FOUND_ROWS! :-)
I’ve attached my raw results as a Gnumeric document, so feel free to peruse them. The test script I’ve used is also attached, so you can re-run the benchmark on your own systems if you wish.
Conclusion Addendum
As pointed out in the Drupal pager issue that caused me to run these tests, the query I’m benchmarking uses the language column, which is not indexed and the test also doesn’t allow the server to cache the COUNT(*) query. I’ve rerun the tests with 10 million rows after adding an index and I no longer get a signification speed difference between the two ways of getting the total number of rows.
So I suppose that at least SQL_CALC_FOUND_ROWS will cause your non-indexed pager queries to suck a lot less than they might otherwise and it won’t hurt if they are properly indexed
So this is about a SELECT COUNT(*) FROM tblname without a WHERE clause. MyISAM has an optimisation for that since it maintains a rowcount for each table. InnoDB and PBXT can’t do that (at least not easily) because of their multi-versioned nature… different transactions may see a different number of rows for the table table!
So, it’s kinda known but nevertheless often ignored that this operation on InnoDB is costly in terms of time; what InnoDB has to do to figure out the exact number of rows is scan the primary key and just tally. Of course it’s faster if it doesn’t have to read a lot of the blocks from disk (i.e. smaller dataset or a large enough buffer pool).
I was curious about PBXT’s performance on this, and behold it appears to be quite a bit faster! For a table with 50 million rows, PBXT took about 20 minutes whereas the same table in InnoDB took 30 minutes. Interesting!
From those numbers [addendum: yes I do realise there's something else wrong on that server to take that long, but it'd be slow regardless] you can tell that doing the query at all is not an efficient thing to do, and definitely not something a frontend web page should be doing. Usually you just need a ballpark figure so running the query in a cron job and putting the value into memcached (or just an include file) will work well in such cases.
If you do use a WHERE clause, all engines (including MyISAM) are in the same boat… they might be able to use an index to filter on the conditions – but the bigger the table, the more work it is for the engine. PBXT being faster than InnoDB for this task makes it potentially interesting for reporting purposes as well, where otherwise you might consider using MyISAM – we generally recommend using a separate reporting slave with particular settings anyway (fewer connections but larger session-specific buffers), but it’s good to have extra choices for the task.
(In case you didn’t know, it’s ok for a slave to use a different engine from a master – so you can really make use of that ability for specialised tasks such as reporting.)
With Paul McCullagh’s PBXT storage engine getting integrated into MariaDB 5.1, it’s never been easier to it out. So we have, on a slave off one of our own production systems which gets lots of inserts from our Zabbix monitoring system.
That’s possibly an ideal usage profile, since PBXT is a log based engine (simplistically stated, it indexes its transaction logs, rather than rewriting data from log into index and indexing that) so it should require less disk I/O than say InnoDB. And that means it should be particularly suited to for instance logging, which have lots of inserts on a sustained basis. Note that for short insert burst you may not see a difference with InnoDB because of caching, but sustain it and then you can notice.
Because PBXT has such different/distinct architecture there’s a lot of learning involved. Together with Paul and help from Roland Bouman we also created a stored procedure that can calculate the optimal average row size for PBXT, and even ALTER TABLE statements you can paste to convert tables. The AVG_ROW_LENGTH option is quite critical with PBXT, if set too big (or if you let PBXT guess and it gets it wrong) it’ll eat heaps more diskspace as well as being much slower, and if too small it’ll be slower also; this, it needs to be in the right ballpark. For existing datasets it can be calculated, so that’s what we’ve worked on. The procs will be published shortly, and Paul will also put them in with the rest of the PBXT files.
Another important aspect for PBXT is having sufficient cache memory allocated, otherwise operations can take much much longer. While the exact “cause” is different, one would notice similar performance aspects when using InnoDB on larger datasets and buffers that are too small for the purpose.
So, while using or converting some tables to PBXT takes a bit of consideration, effort and learning, it appears to be dealing with the real world very well so far – and that’s a testament to Paul’s experience. Paul is also very responsive to questions. As we gain more experience, it is our intent to try PBXT for some of our clients that have operational needs that might be a particularly good fit for PBXT.
I should also mention that it is possible to have a consistent transaction between PBXT, InnoDB and the binary log, because of the 2-phase commit (XA) infrastructure. This means that you should even be able to do a mysqldump with –single-transaction if you have both PBXT and InnoDB tables, and acquire a consistent snapshot!
When asking about up-time requirements set down in SLAs (Service Level Agreements) with our clients’ clients, we’d hear anything ranging from hours to the familiar five nines, but these days also simply 100% and otherwise penalties apply. From my perspective, there’s not much difference between five nines and 100%, 99.999% uptime over a year amounts to a maximum of little over 5 minutes outage. In many cases, this includes scheduled outages!
So, we can just not have any outages, scheduled or otherwise. Emergency support is not going to help here, because however fast and good they are, you’re already in serious penalty time or well on your way to not having a business any more. Most will respond within say 30 minutes but then need up to a few hours to resolve the issue. That won’t help you, really, will it? And in any case, how are you going to do your maintenance? The answer is, you need to architect things differently.
I do appreciate the issue of transitioning from the corporate tradition of outsourcing the liability along with emergency support, e.g. someone to call and if need be sue… it takes time both in business processes as well as in actual architecture to make things resilient. But really, if those are the SLAs you agree on with your clients, that’s what has to be done.
Anyway, aiming for resilience (expecting things to break but building infra so that it can cope with it) rather than purchasing many-9s is I think a better focus. This because making an individual component even more reliable becomes prohibitively expensive, whereas having more servers is relatively cheap. That’s simple economics.
In my time at MySQL AB in the Community Relations possition (2004-2006) I wrote several articles on MySQL’s licensing for the MySQL web site. The core reason for having to explain anything was (and still is) the dual licensing of MySQL, in particular the client library. I left MySQL AB years ago, but people still ask me licensing questions. Below is an excerpt from one such question, and my response.
> Hi, Found a post on the mysql website from Arjen Lentz to do with the whole > mysql licensing question. > Do you know if the issue with, php scripts (that use a mysql database) issued > under a proprietary license require you to have a commercial license for > mysql, or will the issues be covered for the GPL version through the fact > that the scripts run via php which in-turn connects to the GPL mysql server > for which the FOSS exception applies.
Note: I am not a lawyer; this is not legal advice.
The issue might be a bit fuzzy since you are actually dependent on MySQL server, whether or not you are “linking”. So the linkage could be there anyway (there’s no consensus on this interpretation of “linking”, it is however the viewpoint of some – hence the fuzzyness).
My recommendation to you would be to not fuss with any nasty licensing for the PHP code you create for clients. While this provides the client with more freedom, you are the expert and thus the first choice for any support and future development. Providing clients with freedom tends to bind them more to you, while restrictions tend to make them look around for alternatives.
Your clients are in whatever business they’re in, which is probably not PHP code development; it’s not in their interest to go spend time on that or undermining you, unless you were to provide bad service.
If you approach your software in this way with your clients, you can generally GPL it and do equal or better business while not having to worry about nettly licensing questions. You don’t want to base your business on a legal argument, as you just don’t want the question to get raised to begin with… it’d be costly and distracting (if not destructive).