<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Open Query blog &#187; Software and tools</title>
	<atom:link href="http://openquery.com/blog/category/software-and-tools/feed" rel="self" type="application/rss+xml" />
	<link>http://openquery.com/blog</link>
	<description>About MySQL, Drizzle, MariaDB and more!</description>
	<lastBuildDate>Wed, 07 Dec 2011 04:00:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Green HDs and RAID Arrays</title>
		<link>http://openquery.com/blog/green-hds-raid-arrays</link>
		<comments>http://openquery.com/blog/green-hds-raid-arrays#comments</comments>
		<pubDate>Mon, 26 Sep 2011 02:37:22 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Software and tools]]></category>
		<category><![CDATA[array]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[Green]]></category>
		<category><![CDATA[harddisk]]></category>
		<category><![CDATA[hd]]></category>
		<category><![CDATA[HDD]]></category>
		<category><![CDATA[mariadb]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[raid]]></category>
		<category><![CDATA[SAS]]></category>
		<category><![CDATA[SATA]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1562</guid>
		<description><![CDATA[Some so-called &#8220;Green&#8221; harddisks don&#8217;t like being in a RAID array. These are primarily SATA drives, and they gain their green credentials by being able reduce their RPM when not in use, as well as other aggressive power management trickery. That&#8217;s all cool and in a way desirable &#8211; we want our hardware to use [...]]]></description>
			<content:encoded><![CDATA[<p>Some so-called &#8220;Green&#8221; harddisks don&#8217;t like being in a RAID array. These are primarily SATA drives, and they gain their green credentials by being able reduce their RPM when not in use, as well as other aggressive power management trickery. That&#8217;s all cool and in a way desirable &#8211; we want our hardware to use less power whenever possible! &#8211; but the time it takes some drives to &#8220;wake up&#8221; again is longer than a RAID setup is willing to tolerate.</p>
<p>First of all, you may wonder why I bother with SATA disks at all for RAID. I&#8217;ve written about this before, but they simply deliver plenty for much less money. Higher RPM doesn&#8217;t necessarily help you for a db-related (random access) workload, and for tasks like backups which do have a lot of speed may not be a primary concern. SATA disks have a shorter command queue than SAS, so that means they might need to seek more &#8211; however a smart RAID controller would already arrange its I/O in such a way as to optimise that.</p>
<p>The particular application where I tripped over Green disks was a backup array using software RAID10. Yep, a cheap setup &#8211; the objective is to have lots of diskspace with resilience, and access speed is not a requirement.</p>
<p>Not all Green HDs are the same. Western Digital ones allow their settings to be changed, although that does need a DOS tool (just a bit of a pest using a USB stick with FreeDOS and the WD tool, but it&#8217;s doable), whereas Seagate has decided to restrict their Green models such that they don&#8217;t accept any APM commands and can&#8217;t change their configuration.</p>
<p>I&#8217;ve now replaced Seagates with (non-Green) Hitachi drives, and I&#8217;m told that Samsung disks are also ok.</p>
<p>So this is something to keep in mind when looking at SATA RAID arrays. I also think it might be a topic that the Linux software RAID code could address &#8211; if it were &#8220;Green HD aware&#8221; it could a) make sure that they don&#8217;t go to a state that is unacceptable, and b) be tolerant with their response time &#8211; this could be configurable. Obviously, some applications of RAID have higher demands than others, not all are the same.</p>
]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/green-hds-raid-arrays/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LexisNexis open sources code for Hadoop alternative</title>
		<link>http://openquery.com/blog/lexisnexis-open-sources-code-for-hadoop-alternative</link>
		<comments>http://openquery.com/blog/lexisnexis-open-sources-code-for-hadoop-alternative#comments</comments>
		<pubDate>Sat, 10 Sep 2011 10:36:20 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Software and tools]]></category>
		<category><![CDATA[cluster]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hpcc]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/lexisnexis-open-sources-code-for-hadoop-alternative</guid>
		<description><![CDATA[http://gigaom.com/cloud/lexisnexis-open-sources-code-for-hadoop-alternative/ HPCC Systems has released the open source code of its data-processing software that it&#8217;s positioning as a better version of Hadoop. The code is available on Github, and it marks the commencement of HPCC Systems&#8217; quest to build a community of developers underneath Hadoop&#8217;s expansive shadow.]]></description>
			<content:encoded><![CDATA[<div class="wdqs wdqs_link wdqs-link-container">
<p class="wdqs-link-to-source"><a href="http://gigaom.com/cloud/lexisnexis-open-sources-code-for-hadoop-alternative/" target="_blank">http://gigaom.com/cloud/lexisnexis-open-sources-code-for-hadoop-alternative/</a></p>

<div class="wdqs-thumbnail-container"><a href="http://gigaom.com/cloud/lexisnexis-open-sources-code-for-hadoop-alternative/" target="_blank"><img src="http://gigaom2.files.wordpress.com/2011/09/img_hpcc_arch.jpg?w=604&amp;h=317" alt="" width="483" height="254" /></a></div>
<div class="wdqs-text-container">

HPCC Systems has released the open source code of its data-processing software that it&#8217;s positioning as a better version of Hadoop. The code is available on Github, and it marks the commencement of HPCC Systems&#8217; quest to build a community of developers underneath Hadoop&#8217;s expansive shadow.

</div>
</div>]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/lexisnexis-open-sources-code-for-hadoop-alternative/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Stick Figure Guide to the Advanced Encryption Standard (AES)</title>
		<link>http://openquery.com/blog/a-stick-figure-guide-to-the-advanced-encryption-standard-aes</link>
		<comments>http://openquery.com/blog/a-stick-figure-guide-to-the-advanced-encryption-standard-aes#comments</comments>
		<pubDate>Wed, 24 Aug 2011 00:24:42 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Software and tools]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[encryption]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/a-stick-figure-guide-to-the-advanced-encryption-standard-aes</guid>
		<description><![CDATA[http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html Jeff Moser on software development]]></description>
			<content:encoded><![CDATA[<div class="wdqs wdqs_link wdqs-link-container">
<p class="wdqs-link-to-source"><a href="http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html" target="_blank">http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html</a></p>

<div class="wdqs-thumbnail-container"><a href="http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html" target="_blank"><img src="http://3.bp.blogspot.com/_Zfbv3mHcYrc/SreS30GKZdI/AAAAAAAABiE/mSpYbOwJdYI/s576/aes_act_1_scene_01_intro_576.png" alt="" width="346" height="271" /></a></div>
<div class="wdqs-text-container">

Jeff Moser on software development

</div>
</div>]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/a-stick-figure-guide-to-the-advanced-encryption-standard-aes/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HDlatency &#8211; now with quick option</title>
		<link>http://openquery.com/blog/hdlatency-quick-option</link>
		<comments>http://openquery.com/blog/hdlatency-quick-option#comments</comments>
		<pubDate>Thu, 16 Jun 2011 08:31:07 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Software and tools]]></category>
		<category><![CDATA[hdlatency]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[iscsi]]></category>
		<category><![CDATA[latency]]></category>
		<category><![CDATA[mariadb]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[raid]]></category>
		<category><![CDATA[SAN]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1508</guid>
		<description><![CDATA[I&#8217;ve done a minor update to the hdlatency tool (get it from Launchpad), it now has a &#8211;quick option to have it only do its tests with 16KB blocks rather than a whole range of sizes. This is much quicker, and 16KB is the InnoDB page size so it&#8217;s the most relevant for MySQL/MariaDB deployments. [...]]]></description>
			<content:encoded><![CDATA[I&#8217;ve done a minor update to the hdlatency tool (<a href="https://lauchpad.net/hdlatency">get it from Launchpad</a>), it now has a &#8211;quick option to have it only do its tests with 16KB blocks rather than a whole range of sizes. This is much quicker, and 16KB is the InnoDB page size so it&#8217;s the most relevant for MySQL/MariaDB deployments.

However, I didn&#8217;t just remove the other stuff, because it can be very helpful in tracking down problems and putting misconceptions to rest. On SANs (and local RAID of course) you have things like block sizes and stripe sizes, and opinions on what might be faster. Interestingly, the real world doesn&#8217;t always agree with the opinions.

We Mark Callaghan correctly pointed out when I first published it, hdlatency does not provide anything new in terms of functionality, the db IO tests of sysbench cover it all. A key advantage of hdlatency is that it doesn&#8217;t have any dependencies, it&#8217;s a small single piece of C code that&#8217;ll compile on or can run on very minimalistic environments. We often don&#8217;t control what the base environment we have to work on is, so that&#8217;s why hdlatency was initially written. It&#8217;s just a quick little tool that does the job.

We find hdlatency particularly useful for comparing environments, primarily at the same client. For instance, the client might consider moving from one storage solution to another &#8211; well, in that case it&#8217;s useful to know whether we can expect an actual performance benefit.

The burst data rate (big sequential read or write) which often gets quoted for a SAN or even an individual disk is of little interest to database use, since its key performance bottleneck lies in random access I/O. The disk head(s) will need to move. So it&#8217;s important to get some real relevant numbers, rather than just go with magic vendor numbers that are not really relevant to you. Also, you can have a fast storage system attached via a slow interface, and consequentially the performance then will not be at all what you&#8217;d want to see. It can be quite bad.

To get an absolute baseline on what are sane numbers, run hdlatency also on a local desktop HD. This may seem odd, but you might well encounter storage systems that show a lower performance than that. &#8216;nuf said.

If you&#8217;re willing to share, I&#8217;d be quite interested in seeing some (&#8211;quick) output data from you &#8211; just make sure you tell what storage it is: type of interface, etc. Simply drop it in a comment to this post, so it can benefit more people. thanks]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/hdlatency-quick-option/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>MySQL data backup: going beyond mysqldump</title>
		<link>http://openquery.com/blog/mysql-data-backup-mysqldump</link>
		<comments>http://openquery.com/blog/mysql-data-backup-mysqldump#comments</comments>
		<pubDate>Tue, 29 Mar 2011 00:25:12 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Good practice / Bad practice]]></category>
		<category><![CDATA[Software and tools]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[mariadb]]></category>
		<category><![CDATA[mmm]]></category>
		<category><![CDATA[mysqldump]]></category>
		<category><![CDATA[recovery]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[restore]]></category>
		<category><![CDATA[xtrabackup]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1446</guid>
		<description><![CDATA[A user on a linux user group mailing list asked about this, and I was one of the people replying. Re-posting here as I reckon it&#8217;s of wider interest. &#62; [...] tens of gigs of data in MySQL databases. &#62; Some in memory tables, some MyISAM, a fair bit InnoDB. According to my &#62; understanding, [...]]]></description>
			<content:encoded><![CDATA[A user on a linux user group mailing list asked about this, and I was one of the people replying. Re-posting here as I reckon it&#8217;s of wider interest.

&gt; [...] tens of gigs of data in MySQL databases.
&gt; Some in memory tables, some MyISAM, a fair bit InnoDB. According to my
&gt; understanding, when one doesn&#8217;t have several hours to take a DB
&gt; offline and do dbbackup, there was/is ibbackup from InnoBase.. but now
&gt; that MySQL and InnoBase have both been &#8216;Oracle Enterprised&#8217;, said
&gt; product is now restricted to MySQL Enterprise customers..
&gt;
&gt; Some quick searching has suggested Percona XtraBackup as a potential
&gt; FOSS alternative.
&gt; What backup techniques do people employ around these parts for backups
&gt; of large mixed MySQL data sets where downtime *must* be minimised?
&gt;
&gt; Has your backup plan ever been put to the test?

You should put it to the test regularly, not just when it&#8217;s needed.
An untested backup is not really a backup, I think.

At  <a href="http://openquery.com/" target="_blank">Open Query</a> we tend to use dual master setups with MMM, other  replication slaves, mysqldump, and XtracBackup or LVM snapshots. It&#8217;s  not just about having backups, but also about general resilience,  maintenance options, and scalability. I&#8217;ll clarify:
<ul>
	<li>XtraBackup and LVM give you physical backups. that&#8217;s nice if you want to  recover or clone a complete instance as-is. But if anything is wrong,  it&#8217;ll be all stuffed (that is, you can sometimes recover InnoDB  tablespaces and there are tools for it, but time may not be on your  side). Note that LVM cannot snapshot between multiple volumes  consistently, so if you have your InnoDB ibdata/IBD files and iblog  files on separate spindles, using LVM is not suitable.</li>
</ul>
<ul>
	<li>mysqldump for logical (SQL) backups. Most if not all setups should have  this. Even if the file(s) were to be corrupted, they&#8217;re still readable  since it&#8217;s plain SQL. You can do partial restores, which is handy in  some cases. It&#8217;ll be slower to load so having *only* an SQL dump of a  larger dataset is not a good idea.</li>
</ul>
<ul>
	<li>some of the above backups  can and should *also* be copied off-site. that&#8217;s for extra safety, but  in terms of recovery speed it may not be optimal and should not be  relied upon.</li>
</ul>
<ul>
	<li>having dual masters is for easier maintenance  without scheduled outages, as well as resilience when for instance  hardware breaks (and it does).</li>
</ul>
<ul>
	<li>slaves. You can even delay a  slave (Maatkit has a tool for this), so that would give you a live  correct image even in case of a user error, provided you get to it in  time. Also, you want enough slack in your infra to be able to initialise  a new slave off an existing one. Scaling up at a time when high load is  already occurring can become painful if your infra is not prepared for  it.</li>
</ul>
<strong>A key issue to consider is this&#8230; if the dataset is  sufficiently large, and the online requirements high enough, you can&#8217;t  afford to just have backups. Why? Because, how quickly can you deploy  new suitable hardware, install OS, do restore, validate, put back  online?</strong>

<em>In many cases one or more aspects of the above list simply  take too long, so my summary would be &#8220;then you don&#8217;t really have a  backup&#8221;. Clients tend to argue with me on that, but only fairly briefly, until  they see the point: if a restore takes longer than you can afford, that  backup mechanism is unsuitable.</em>

So, we use a combination of tools  and approaches depending on needs, but in general terms we aim for  keeping the overall environment online (individual machines can and will  fail! relying on a magic box or SAN to not fail *will* get you bitten)  to vastly reduce the instances where an actual restore is required.
Into  that picture also comes using separate test/staging servers to not have  developers stuff around on live servers (human error is an important  cause of hassles).

In our training modules, we&#8217;ve combined the  backups, recovery and replication topics as it&#8217;s clearly all intertwined  and overlapping. Discussing backup techniques separate from replication  and dual master setups makes no sense to us. It needs to be put in  place with an overall vision.

Note that a SAN is not a backup strategy. And neither is replication on its own.]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/mysql-data-backup-mysqldump/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Importing a file dumped from MySQL with mysqldump into drizzle</title>
		<link>http://openquery.com/blog/importing-file-dumped-mysql-mysqldump-drizzle</link>
		<comments>http://openquery.com/blog/importing-file-dumped-mysql-mysqldump-drizzle#comments</comments>
		<pubDate>Thu, 17 Mar 2011 16:03:24 +0000</pubDate>
		<dc:creator>Walter Heck</dc:creator>
				<category><![CDATA[Software and tools]]></category>
		<category><![CDATA[drizzle]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1438</guid>
		<description><![CDATA[As a big fan of new technology, we try to keep up to date with what&#8217;s happening in the industry. As such, I decided to start using drizzle on my development machine since they announced GA this week. First exercise: import a file dumped from a MySQL server I don&#8217;t have access to into drizzle. [...]]]></description>
			<content:encoded><![CDATA[<div id="_mcePaste">As a big fan of new technology, we try to keep up to date with what&#8217;s happening in the industry. As such, I decided to start using <a href="http://drizzle.org">drizzle</a> on my development machine since they <a href="http://blog.drizzle.org/2011/03/15/drizzle-2011-03-12-ga-tarball-has-been-released/">announced GA</a> this week.</div>
<div></div>
<div id="_mcePaste">First exercise: import a file dumped from a MySQL server I don&#8217;t have access to into drizzle. Normally, you can use drizzledump on the mysql server and make it dump a drizzle compatible file. Not in this case, so I decided to sed my way through the various errors. Not pretty, and I hope that at some point we&#8217;ll have a tool that can convert a mysqldump into a drizzle compatible file, but it works for now.</div>
<div></div>
<div>Here&#8217;s what I had to do. Note that this is by no means complete or comes with any guarantees, it&#8217;s just a starting point.</div>
<pre># This file started by setting a SQL_MODE. That doesn't exist in 
# drizzle, so we comment it out
sed -i "s/^SET SQL_MODE/#SET SQL_MODE/g" mysqldump.sql 

# The create database statement set a default character set. 
# Everything in drizzle is UTF8, so let's lose it!
sed -i "s/DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci//g" mysqldump.sql 

# The table definitions mentioned a default character set. 
# Everything in drizzle is UTF8, so let's lose it!
sed -i 's/DEFAULT CHARSET=utf8//g' mysqldump.sql 

# No MyISAM except for temporary tables, so away with it.
sed -i 's/ENGINE=MyISAM//g' mysqldump.sql 

# Invalid timestamps are not accepted in drizzle, so this should be a null 
# value. Since some of the columns in this file are actually NOT NULL defined, 
# for now I just set those dates to 1970. UGLY, but works for me. Don't do this 
# on anything that will ever go anywhere near production though!
sed -i "s/'0000-00-00/'1970-01-01/g" mysqldump.sql 

# tinyint doesn't exist anymore, so just replace with integer. Note that you'll 
# have to do this for all data types that no longer exist in drizzle
sed -i "s/tinyint(.*)/integer/g" mysqldump.sql</pre>
<div>Hope this helps others!</div>]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/importing-file-dumped-mysql-mysqldump-drizzle/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A day in the life of Datacenter Disasters</title>
		<link>http://openquery.com/blog/day-life-datacenter-disasters</link>
		<comments>http://openquery.com/blog/day-life-datacenter-disasters#comments</comments>
		<pubDate>Tue, 23 Nov 2010 05:07:13 +0000</pubDate>
		<dc:creator>Walter Heck</dc:creator>
				<category><![CDATA[Software and tools]]></category>
		<category><![CDATA[disaster]]></category>
		<category><![CDATA[mmm]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1409</guid>
		<description><![CDATA[Open Query currently hosts a large part of our infrastructure at Linode. We are extremely happy with their performance, stability and support. Unfortunately any chain is only as strong as it&#8217;s weakest link. This week, there was a major thunderstorm near the Hurricane Electric datacenter (anyone else think that name is funny in combination with [...]]]></description>
			<content:encoded><![CDATA[Open Query currently hosts a large part of our infrastructure at Linode. We are extremely happy with their performance, stability and support. Unfortunately any chain is only as strong as it&#8217;s weakest link. This week, there was a major thunderstorm near the Hurricane Electric datacenter (anyone else think that name is funny in combination with the event in case?) in fremont and through a massive powersurge, most of HE&#8217;s datacenter lost power. Among the Linodes affected in our infrastructure were all of the machines involved in our MMM setup.

The masters came back up before the monitor, which is around the time I was alerted. Logging in, I noticed replication was broken on one of the masters, but the other master seemed healthy. Since the monitor was not up and it seemed like it could potentially be hours before it would, I decided it was time for manual action. Since our MMM setup doesn&#8217;t have slaves currently, I decided a good option would be to mimic MMM and move the virtual IP to the healthy server.

I executed the following manual commands to make the desired changes:
<pre>$ ip addr add &lt;virtip&gt; dev eth0</pre>
<pre>$ /usr/sbin/arping -I eth0 -c 5 &lt;virtip&gt;</pre>
That brought all our applications back online, which was the desired effect. I manually fixed replication by repositioning the masters. A while later, the monitor came up and automatically took over, bringing everything back to normal.

Everything went well, but it wasn&#8217;t until the next morning I realised there was a possible flaw in my logic (that din&#8217;t effect us, but I wanted to blog about it to make others realise): When replication stopped, master A was active. My commands above made master B the active master. Now, in theory it is possible that writes were sent to master A after replication broke, and commands that were sent to master B would presume those writes were executed there which they were not as replication didn&#8217;t execute them. This is one of those niche occasions where data-drift can occur without noticing it.

My recommendation is to not do what I did unless you are very certain your setup doesn&#8217;t suffer from this potential problem. If you do decide to use this trick however, make sure to use the maatkit mk-tablecheck and mk-tablesynch when all is well again to check for (and correct!) data drift.]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/day-life-datacenter-disasters/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PBXT early impressions in production use</title>
		<link>http://openquery.com/blog/pbxt-early-impressions-production</link>
		<comments>http://openquery.com/blog/pbxt-early-impressions-production#comments</comments>
		<pubDate>Thu, 27 May 2010 02:03:19 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Software and tools]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[mariadb]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[pbxt]]></category>
		<category><![CDATA[storage engine]]></category>
		<category><![CDATA[XA]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1257</guid>
		<description><![CDATA[With Paul McCullagh&#8217;s PBXT storage engine getting integrated into MariaDB 5.1, it&#8217;s never been easier to it out. So we have, on a slave off one of our own production systems which gets lots of inserts from our Zabbix monitoring system. That&#8217;s possibly an ideal usage profile, since PBXT is a log based engine (simplistically [...]]]></description>
			<content:encoded><![CDATA[With Paul McCullagh&#8217;s <a href="http://primebase.org" target="_blank">PBXT</a> storage engine getting integrated into <a href="http://askmonty.org" target="_blank">MariaDB 5.1</a>, it&#8217;s never been easier to it out. So we have, on a slave off one of our own production systems which gets lots of inserts from our Zabbix monitoring system.

That&#8217;s possibly an ideal usage profile, since PBXT is a log based engine (simplistically stated, it indexes its transaction logs, rather than rewriting data from log into index and indexing that) so it should require less disk I/O than say InnoDB. And that means it should be particularly suited to for instance logging, which have lots of inserts on a sustained basis. Note that for short insert burst you may not see a difference with InnoDB because of caching, but sustain it and then you can notice.

Because PBXT has such different/distinct architecture there&#8217;s a lot of learning involved. Together with Paul and help from Roland Bouman we also created a stored procedure that can calculate the optimal average row size for PBXT, and even ALTER TABLE statements you can paste to convert tables. The AVG_ROW_LENGTH option is quite critical with PBXT, if set too big (or if you let PBXT guess and it gets it wrong) it&#8217;ll eat heaps more diskspace as well as being much slower, and if too small it&#8217;ll be slower also; this, it needs to be in the right ballpark. For existing datasets it can be calculated, so that&#8217;s what we&#8217;ve worked on. The procs will be published shortly, and Paul will also put them in with the rest of the PBXT files.

Another important aspect for PBXT is having sufficient cache memory allocated, otherwise operations can take much much longer. While the exact &#8220;cause&#8221; is different, one would notice similar performance aspects when using InnoDB on larger datasets and buffers that are too small for the purpose.

So, while using or converting some tables to PBXT takes a bit of consideration, effort and learning, it appears to be dealing with the real world very well so far &#8211; and that&#8217;s a testament to Paul&#8217;s experience. Paul is also very responsive to questions. As we gain more experience, it is our intent to try PBXT for some of our clients that have operational needs that might be a particularly good fit for PBXT.

I should also mention that it is possible to have a consistent  transaction between PBXT, InnoDB and the binary log, because of the  2-phase commit (XA) infrastructure. This means that you should even be  able to do a mysqldump with &#8211;single-transaction if you have both PBXT  and InnoDB tables, and acquire a consistent snapshot!

More experiences and details to come.]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/pbxt-early-impressions-production/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Quest for Resilience: Multi-DC Masters</title>
		<link>http://openquery.com/blog/quest-resilience-multidc-masters</link>
		<comments>http://openquery.com/blog/quest-resilience-multidc-masters#comments</comments>
		<pubDate>Fri, 14 May 2010 01:07:57 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Software and tools]]></category>
		<category><![CDATA[datacentre]]></category>
		<category><![CDATA[DRBD]]></category>
		<category><![CDATA[failover]]></category>
		<category><![CDATA[mariadb]]></category>
		<category><![CDATA[mmm]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[resilience]]></category>
		<category><![CDATA[SAN]]></category>
		<category><![CDATA[VM]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1241</guid>
		<description><![CDATA[This is a Request for Input. Dual MySQL masters with MMM in a single datacentre are in common use, and other setups like DRBD and of course VM/SAN based failover solutions are conceptually straightforward also. Thus, achieving various forms of resilience within a single data-centre is doable and not costly. Doing the same across multiple [...]]]></description>
			<content:encoded><![CDATA[This is a Request for Input. Dual MySQL masters with MMM in a single datacentre are in common use, and other setups like DRBD and of course VM/SAN based failover solutions are conceptually straightforward also. Thus, achieving various forms of resilience within a single data-centre is doable and not costly.

Doing the same across multiple (let&#8217;s for simplicity sake limit it to two) datacentres is another matter. MySQL replication works well across longer links, and it can use MySQL&#8217;s in-built SSL or tools like stunnel. Of course it needs to be kept an eye on, as usual, but since it&#8217;s asynchronous the latency between the datacentres is not a big issue (apart from the fact that the second server gets up-to-date a little bit later).

But as those who have tried will know, having a client (application server) connection to a MySQL instance in a remote data-centre is a whole other matter, latency becomes a big issue and is generally very noticeable on the front-end. One solution for that is to have application servers only connect to their &#8220;local&#8221; MySQL server.

So the question to you is, do you now have (or have you had in the past) a setup with MySQL masters in different datacentres, what did that setup look like (which additional tools and infra did you use for it), and what were your experiences (good and bad, solutions to issues, etc). I&#8217;m trying to gather additional expertise that might already be about, which can help us all. Please add your input! thanks]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/quest-resilience-multidc-masters/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Open Query @ MySQL Conf &amp; Expo 2010</title>
		<link>http://openquery.com/blog/open-query-mysql-conf-expo-2010</link>
		<comments>http://openquery.com/blog/open-query-mysql-conf-expo-2010#comments</comments>
		<pubDate>Thu, 08 Apr 2010 14:53:18 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[GRAPH engine]]></category>
		<category><![CDATA[Software and tools]]></category>
		<category><![CDATA[graphengine]]></category>
		<category><![CDATA[mariadb]]></category>
		<category><![CDATA[mmm]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[mysqlconf]]></category>
		<category><![CDATA[neo4j]]></category>
		<category><![CDATA[open query]]></category>
		<category><![CDATA[OQGRAPH]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1216</guid>
		<description><![CDATA[Walter and I are giving a tutorial on Monday morning, MySQL (and MariaDB) Dual Master Setups with MMM, I believe there are still some seats available &#8211; tutorials are a bit extra when you register for the conference, so you do need to sign up if you want to be there! It&#8217;s a hands-on tutorial/workshop, we&#8217;ll be [...]]]></description>
			<content:encoded><![CDATA[Walter and I are giving a tutorial on Monday morning, <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/12434" target="_blank">MySQL (and MariaDB) Dual Master Setups with MMM</a>, I believe there are still some seats available &#8211; tutorials are a bit extra when you <a href="https://en.oreilly.com/mysql2010/public/register" target="_blank">register for the conference</a>, so you do need to sign up if you want to be there! It&#8217;s a hands-on tutorial/workshop, we&#8217;ll be setting up multiple clusters with dual master and the whole rest of the MMM fun, using VMs on your laptops and a separate wired network. Nothing beats messing with something live, breaking it, and seeing what happens!

Then on Tuesday afternoon (5:15pm, Ballroom F), Antony and I will do a session on the <a href="http://en.oreilly.com/mysql2010/public/schedule/detail/12586" target="_blank">OQGRAPH engine: hierarchies/graphs inside the database made easy</a>. If you&#8217;ve been struggling with trees in SQL, would really like to effectively use social networking in your applications, need to work with RDF datasets, or have been exploring <em>neo4j</em> but otherwise have everything in MySQL or MariaDB, this session is for you.

We (and a few others from OQ) will be around for the entire conference, the <a href="http://www.pythian.com/news/8809/announcing-monday-night-community-dinner-at-pedros-during-the-oreilly-mysql-conference-expo/" target="_blank">community dinner</a> (Monday evening) and other social events, and are happy to answer any questions you might have. You&#8217;ll be able to easily recognise us in the crowds by our distinct friendly <a href="http://openquery.com" target="_blank">Open Query</a> olive green shirts (green stands out because most companies mainly use blue/grey and orange/red).

Naturally we would love to do business with you (<a href="http://openquery.com/services/proactive" target="_blank">proactive support services</a>, <a href="http://openquery.com/graph" target="_blank">OQGRAPH development</a>), but we don&#8217;t push ourselves on to unsuitable scenarios. In fact, we&#8217;re known to refer and even actively introduce clients to competent other vendors where appropriate. In any case, it&#8217;s our pleasure and privilege to meet you!

See you all in Santa Clara in a few days.]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/open-query-mysql-conf-expo-2010/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->
