<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Open Query blog &#187; Good practice / Bad practice</title>
	<atom:link href="http://openquery.com/blog/category/good-practice-bad-practice/feed" rel="self" type="application/rss+xml" />
	<link>http://openquery.com/blog</link>
	<description>About MySQL, Drizzle, MariaDB and more!</description>
	<lastBuildDate>Sun, 29 Apr 2012 23:48:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>What a Hosting Provider did Today</title>
		<link>http://openquery.com/blog/what-hosting-provider-did-today</link>
		<comments>http://openquery.com/blog/what-hosting-provider-did-today#comments</comments>
		<pubDate>Mon, 31 Oct 2011 06:39:15 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Good practice / Bad practice]]></category>
		<category><![CDATA[destruction]]></category>
		<category><![CDATA[helpful]]></category>
		<category><![CDATA[hosting]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[logfile]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[recovery]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[tablespace]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1573</guid>
		<description><![CDATA[I found Dennis the Menace, he now has a job as system administrator for a hosting company. Scenario: client has a problem with a server becoming unavailable (cause unknown) and has it restarted. MySQL had some page corruption in the InnoDB tablespace. The hosting provider, being really helpful, goes in as root and first deletes [...]]]></description>
			<content:encoded><![CDATA[<p>I found Dennis the Menace, he now has a job as system administrator for a hosting company. Scenario: client has a problem with a server becoming unavailable (cause unknown) and has it restarted. MySQL had some page corruption in the InnoDB tablespace.</p>
<p>The hosting provider, being really helpful, goes in as root and first deletes ib_logfile* then ib* in /var/lib/mysql. He later says &#8220;I am sorry if I deleted it. I thought I deleted the log only. Sorry again.&#8221;  Now this may appear nice, but people who know what they&#8217;re doing with MySQL will realise that deleting the iblogfiles actually destroys data also. MySQL of course screams loudly that while it has FRM files it can&#8217;t find the tables. No kidding!</p>
<p>Then, while he&#8217;s been told to not touch anything any more, and I&#8217;m trying to see if I can recover the deleted files on ext3 filesystem (yes there are tools for that), he goes in again and puts an ibdata1 file back. No, not the logfiles &#8211; but he had those somewhere else too. The files get restored and turn out to be two months old (no info on how they were made in the first place but that&#8217;s minor detail in this grand scheme). All the extra write activity on the partition would&#8217;ve also made potential deleted file recovery more difficult or impossible.</p>
<p>This story will still get a &#8220;happy&#8221; ending, using a recent mysqldump to load a new server at a different hosting provider. Really &#8211; some helpfulness is not what you want. Secondary lesson: pick your hosting provider with care. Feel free to ask us for recommendations as we know some excellent providers and have encountered plenty of poor ones.</p>
]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/what-hosting-provider-did-today/feed</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>When Clever Goes Wrong &amp; How Etsy Overcame &#8211; Arstechnica</title>
		<link>http://openquery.com/blog/when-clever-goes-wrong-how-etsy-overcame-arstechnica</link>
		<comments>http://openquery.com/blog/when-clever-goes-wrong-how-etsy-overcame-arstechnica#comments</comments>
		<pubDate>Wed, 05 Oct 2011 02:35:18 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Good practice / Bad practice]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[enterprise]]></category>
		<category><![CDATA[middleware]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[stored procedures]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/when-clever-goes-wrong-how-etsy-overcame-arstechnica</guid>
		<description><![CDATA[In 2007, Etsy made a big bet on homegrown middleware to help with the site&#8217;s scalability. A half-year after it was taken live, the company decided to abandon it. As a senior software engineer at Etsy put it, &#8220;if you&#8217;re doing something &#8216;clever,&#8221; you&#8217;re probably doing it wrong.&#8221; Read the full article at Arstechnica.com I [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>In 2007, Etsy made a big bet on homegrown middleware to help with the site&#8217;s scalability. A half-year after it was taken live, the company decided to abandon it. As a senior software engineer at Etsy put it, &#8220;if you&#8217;re doing something &#8216;clever,&#8221; you&#8217;re probably doing it wrong.&#8221;</p>
<p><em>Read the full article at <a href="http://arstechnica.com/business/news/2011/10/when-clever-goes-wrong-how-etsy-overcame-poor-architectural-choices.ars" target="_blank">Arstechnica.com</a></em></p></blockquote>
<p>I want to focus on the important lessons from this article, about middleware and using stored procedures in this fashion for a public web application, creating unscalable design complexity (smart and &#8220;proper&#8221; according to the old enterprise design teachings&#8230;) &#8211; causing infrastructure, development and maintenance hassles.</p>
<p>In the process they did replace PostgreSQL with MySQL but that&#8217;s not the critical change that made all the difference. PostgreSQL is a fine database system also.</p>
]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/when-clever-goes-wrong-how-etsy-overcame-arstechnica/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On Password Strength</title>
		<link>http://openquery.com/blog/password-strength</link>
		<comments>http://openquery.com/blog/password-strength#comments</comments>
		<pubDate>Thu, 11 Aug 2011 00:38:17 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Good practice / Bad practice]]></category>
		<category><![CDATA[application]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[entropy]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[mariadb]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[password]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[xkcd]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1520</guid>
		<description><![CDATA[XKCD (as usual) makes a very good point &#8211; this time about password strength, and I reckon it&#8217;s something app developers need to consider urgently. Geeks can debate the exact amount of entropy, but that&#8217;s not really the issue: insisting on mixed upper/lower and/or non-alpha and/or numerical components to a user password does not really [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://xkcd.com/936/" target="_blank">XKCD</a> (as usual) makes a very good point &#8211; this time about password strength, and I reckon it&#8217;s something app developers need to consider urgently. Geeks can debate the exact amount of entropy, but that&#8217;s not really the issue: insisting on mixed upper/lower and/or non-alpha and/or numerical components to a user password does not really improve security, and definitely makes life more difficult for users.</p>
<p>So basically, the functions that do a &#8220;is this a strong password&#8221; should seriously reconsider their approach, particularly if they&#8217;re used to have the app decide whether to accept the password as &#8220;good enough&#8221; at all.</p>
<p><a href="http://xkcd.com/936/" target="_blank"><img class="alignnone" src="http://imgs.xkcd.com/comics/password_strength.png" alt="" width="518" height="421" /></a></p>
<p>Update: Jeff Preshing has written an <a href="http://preshing.com/20110811/xkcd-password-generator" target="_blank">xkcd password generator</a>. Users probably should choose their own four words, but it&#8217;s a nice example and a similar method could be used by an app to give &#8220;password suggestions&#8221; that are still safe.</p>
]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/password-strength/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL data backup: going beyond mysqldump</title>
		<link>http://openquery.com/blog/mysql-data-backup-mysqldump</link>
		<comments>http://openquery.com/blog/mysql-data-backup-mysqldump#comments</comments>
		<pubDate>Tue, 29 Mar 2011 00:25:12 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Good practice / Bad practice]]></category>
		<category><![CDATA[Software and tools]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[mariadb]]></category>
		<category><![CDATA[mmm]]></category>
		<category><![CDATA[mysqldump]]></category>
		<category><![CDATA[recovery]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[restore]]></category>
		<category><![CDATA[xtrabackup]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1446</guid>
		<description><![CDATA[A user on a linux user group mailing list asked about this, and I was one of the people replying. Re-posting here as I reckon it&#8217;s of wider interest. &#62; [...] tens of gigs of data in MySQL databases. &#62; Some in memory tables, some MyISAM, a fair bit InnoDB. According to my &#62; understanding, [...]]]></description>
			<content:encoded><![CDATA[<p>A user on a linux user group mailing list asked about this, and I was one of the people replying. Re-posting here as I reckon it&#8217;s of wider interest.</p>
<p>&gt; [...] tens of gigs of data in MySQL databases.<br />
&gt; Some in memory tables, some MyISAM, a fair bit InnoDB. According to my<br />
&gt; understanding, when one doesn&#8217;t have several hours to take a DB<br />
&gt; offline and do dbbackup, there was/is ibbackup from InnoBase.. but now<br />
&gt; that MySQL and InnoBase have both been &#8216;Oracle Enterprised&#8217;, said<br />
&gt; product is now restricted to MySQL Enterprise customers..<br />
&gt;<br />
&gt; Some quick searching has suggested Percona XtraBackup as a potential<br />
&gt; FOSS alternative.<br />
&gt; What backup techniques do people employ around these parts for backups<br />
&gt; of large mixed MySQL data sets where downtime *must* be minimised?<br />
&gt;<br />
&gt; Has your backup plan ever been put to the test?</p>
<p>You should put it to the test regularly, not just when it&#8217;s needed.<br />
An untested backup is not really a backup, I think.</p>
<p>At  <a href="http://openquery.com/" target="_blank">Open Query</a> we tend to use dual master setups with MMM, other  replication slaves, mysqldump, and XtracBackup or LVM snapshots. It&#8217;s  not just about having backups, but also about general resilience,  maintenance options, and scalability. I&#8217;ll clarify:</p>
<ul>
<li>XtraBackup and LVM give you physical backups. that&#8217;s nice if you want to  recover or clone a complete instance as-is. But if anything is wrong,  it&#8217;ll be all stuffed (that is, you can sometimes recover InnoDB  tablespaces and there are tools for it, but time may not be on your  side). Note that LVM cannot snapshot between multiple volumes  consistently, so if you have your InnoDB ibdata/IBD files and iblog  files on separate spindles, using LVM is not suitable.</li>
</ul>
<ul>
<li>mysqldump for logical (SQL) backups. Most if not all setups should have  this. Even if the file(s) were to be corrupted, they&#8217;re still readable  since it&#8217;s plain SQL. You can do partial restores, which is handy in  some cases. It&#8217;ll be slower to load so having *only* an SQL dump of a  larger dataset is not a good idea.</li>
</ul>
<ul>
<li>some of the above backups  can and should *also* be copied off-site. that&#8217;s for extra safety, but  in terms of recovery speed it may not be optimal and should not be  relied upon.</li>
</ul>
<ul>
<li>having dual masters is for easier maintenance  without scheduled outages, as well as resilience when for instance  hardware breaks (and it does).</li>
</ul>
<ul>
<li>slaves. You can even delay a  slave (Maatkit has a tool for this), so that would give you a live  correct image even in case of a user error, provided you get to it in  time. Also, you want enough slack in your infra to be able to initialise  a new slave off an existing one. Scaling up at a time when high load is  already occurring can become painful if your infra is not prepared for  it.</li>
</ul>
<p><strong>A key issue to consider is this&#8230; if the dataset is  sufficiently large, and the online requirements high enough, you can&#8217;t  afford to just have backups. Why? Because, how quickly can you deploy  new suitable hardware, install OS, do restore, validate, put back  online?</strong></p>
<p><em>In many cases one or more aspects of the above list simply  take too long, so my summary would be &#8220;then you don&#8217;t really have a  backup&#8221;. Clients tend to argue with me on that, but only fairly briefly, until  they see the point: if a restore takes longer than you can afford, that  backup mechanism is unsuitable.</em></p>
<p>So, we use a combination of tools  and approaches depending on needs, but in general terms we aim for  keeping the overall environment online (individual machines can and will  fail! relying on a magic box or SAN to not fail *will* get you bitten)  to vastly reduce the instances where an actual restore is required.<br />
Into  that picture also comes using separate test/staging servers to not have  developers stuff around on live servers (human error is an important  cause of hassles).</p>
<p>In our training modules, we&#8217;ve combined the  backups, recovery and replication topics as it&#8217;s clearly all intertwined  and overlapping. Discussing backup techniques separate from replication  and dual master setups makes no sense to us. It needs to be put in  place with an overall vision.</p>
<p>Note that a SAN is not a backup strategy. And neither is replication on its own.</p>
]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/mysql-data-backup-mysqldump/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Challenge: identify this pattern in datadir</title>
		<link>http://openquery.com/blog/challenge-identify-this-pattern-in-datadir</link>
		<comments>http://openquery.com/blog/challenge-identify-this-pattern-in-datadir#comments</comments>
		<pubDate>Mon, 02 Aug 2010 02:07:14 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Good practice / Bad practice]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[datadir]]></category>
		<category><![CDATA[mariadb]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[symbolic link]]></category>
		<category><![CDATA[symlink]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1373</guid>
		<description><![CDATA[You take a look at someone&#8217;s MySQL (or MariaDB) data directory, and see mysql foo bar -&#62; foo What&#8217;s the issue? Identify pattern. What does it mean?  Consequences. Is there any way it can be safe and useful/usable? Describe. Good luck!]]></description>
			<content:encoded><![CDATA[<p>You take a look at someone&#8217;s MySQL (or MariaDB) data directory, and see</p>
<blockquote><p>mysql<br />
foo<br />
bar -&gt; foo</p></blockquote>
<ol>
<li>What&#8217;s the issue? Identify pattern.</li>
<li>What does it mean?  Consequences.</li>
<li>Is there any way it can be safe and useful/usable? Describe.</li>
</ol>
<p>Good luck!</p>
]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/challenge-identify-this-pattern-in-datadir/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Unqualified COUNT(*) speed PBXT vs InnoDB</title>
		<link>http://openquery.com/blog/unqualified-count-speed-pbxt-vs-innodb</link>
		<comments>http://openquery.com/blog/unqualified-count-speed-pbxt-vs-innodb#comments</comments>
		<pubDate>Thu, 27 May 2010 04:54:47 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Good practice / Bad practice]]></category>
		<category><![CDATA[COUNT]]></category>
		<category><![CDATA[index scan]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[mariadb]]></category>
		<category><![CDATA[MyISAM]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[pbxt]]></category>
		<category><![CDATA[reporting]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1261</guid>
		<description><![CDATA[So this is about a SELECT COUNT(*) FROM tblname without a WHERE clause. MyISAM has an optimisation for that since it maintains a rowcount for each table. InnoDB and PBXT can&#8217;t do that (at least not easily) because of their multi-versioned nature&#8230; different transactions may see a different number of rows for the table table! [...]]]></description>
			<content:encoded><![CDATA[<p>So this is about a <strong>SELECT COUNT(*) FROM tblname</strong> without a <strong>WHERE</strong> clause. MyISAM has an optimisation for that since it maintains a rowcount for each table. InnoDB and PBXT can&#8217;t do that (at least not easily) because of their multi-versioned nature&#8230; different transactions may <em>see</em> a different number of rows for the table table!</p>
<p>So, it&#8217;s kinda known but nevertheless often ignored that this operation on InnoDB is costly in terms of time; what InnoDB has to do to figure out the exact number of rows is scan the primary key and just tally. Of course it&#8217;s faster if it doesn&#8217;t have to read a lot of the blocks from disk (i.e. smaller dataset or a large enough buffer pool).</p>
<p>I was curious about PBXT&#8217;s performance on this, and behold it appears to be quite a bit faster! For a table with 50 million rows, PBXT took about 20 minutes whereas the same table in InnoDB took 30 minutes. Interesting!</p>
<p>From those numbers [addendum: yes I do realise there's something else wrong on that server to take that long, but it'd be slow regardless] you can tell that doing the query at all is not an efficient thing to do, and definitely not something a frontend web page should be doing. Usually you just need a ballpark figure so running the query in a cron job and putting the value into memcached (or just an include file) will work well in such cases.</p>
<p>If you do use a WHERE clause, all engines (including MyISAM) are in the same boat&#8230; they  might be able to use an index to filter on the conditions &#8211; but the  bigger the table, the more work it is for the engine. PBXT being faster than InnoDB for this task makes it potentially interesting for reporting purposes as well, where otherwise you might consider using MyISAM &#8211; we generally recommend using a separate reporting slave with particular settings anyway (fewer connections but larger session-specific buffers), but it&#8217;s good to have extra choices for the task.</p>
<p>(In case you didn&#8217;t know, it&#8217;s ok for a slave to use a different engine from a master &#8211; so you can really make use of that ability for specialised tasks such as reporting.)</p>
]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/unqualified-count-speed-pbxt-vs-innodb/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The actual range and storage size of an INT</title>
		<link>http://openquery.com/blog/actual-range-storage-size-int</link>
		<comments>http://openquery.com/blog/actual-range-storage-size-int#comments</comments>
		<pubDate>Mon, 29 Mar 2010 02:45:03 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Good practice / Bad practice]]></category>
		<category><![CDATA[birthdate]]></category>
		<category><![CDATA[data types]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[date-of-birth]]></category>
		<category><![CDATA[int]]></category>
		<category><![CDATA[integer]]></category>
		<category><![CDATA[mariadb]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[year]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1205</guid>
		<description><![CDATA[What&#8217;s the difference between INT(2) and INT(20) ? Not a lot. It&#8217;s about output formatting, which you&#8217;ll never encounter when talking with the server through an API (like you do from most app languages). The confusion stems from the fact that with CHAR(n) and VARCHAR(n), the (n) signifies the length or maximum length of that [...]]]></description>
			<content:encoded><![CDATA[<p>What&#8217;s the difference between INT(2) and INT(20) ? Not a lot. It&#8217;s about output formatting, which you&#8217;ll never encounter when talking with the server through an API (like you do from most app languages).</p>
<p>The confusion stems from the fact that with CHAR(n) and VARCHAR(n), the (n) signifies the length or maximum length of that field. But for INT, the range and storage size is specified using different data types: TINYINT, SMALLINT, MEDIUMINT, INT (aka INTEGER), BIGINT.</p>
<p>At Open Query we tend to pick on things like INT(2) when reviewing a client&#8217;s schema, because chances are that the developers/DBAs are working under a mistaken assumption and this could cause trouble somewhere &#8211; even if not in the exact spot where we pick on it. So it&#8217;s a case of pattern recognition.</p>
<p>A very practical example of this comes from a client I worked with last week. I first spotted some harmless ones, we talked about it, and then we hit the jackpot: INT(22) or something, which in fact was storing a unix timestamp converted to int by the application, for the purpose of, wait for this, user&#8217;s birth date. There&#8217;s a number of things wrong with this, and the result is something that doesn&#8217;t work properly.</p>
<p>Currently, the unix epoc/timestamp when stored in binary is a 32 bit unsigned integer, with a range from 1970-01-01 to somewhere in 2037. Note the unsigned qualifier, otherwise it already wraps around 2004.</p>
<ul>
<li> if using signed, you&#8217;d currently only find out with users younger than 7 or so. You may be &#8220;lucky&#8221; to not have any, but kids are tech savvy so websites and systems in general may well have entries with kids younger than that.</li>
</ul>
<ul>
<li> using a timestamp for date-of-birth tells me that the developers are young <img src='http://openquery.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  well that&#8217;s relative, but in this: younger than 40. I was born in 1969, so I am very aware that it&#8217;s impossible to represent my birthdate in a unix timestamp! What dates do you test with? Your own, and people around you. &#8216;nuf said.</li>
<li>finally, INT(22) is still an INT, which for MySQL means 32 bits (4 bytes) and it happened to be signed also.</li>
</ul>
<p>So, all in all, this wasn&#8217;t going to work. Exactly what would fail where would be highly app code (and date) dependent, but you can tell it needs a quick redesign anyway.</p>
<p>I actually suggested checking the requirements whether having just a year would suffice for the intended use (can be stored in a YEAR(4) field), this reduces the amount of personal data stored and thus removes privacy concerns. Otherwise, a DATE field which can optionally be allowed to not have a day-of-month (i.e. only ask for year/month) as that again can be sufficient for the intended purpose.</p>
]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/actual-range-storage-size-int/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>relay-log-space-limit</title>
		<link>http://openquery.com/blog/relaylogspacelimit</link>
		<comments>http://openquery.com/blog/relaylogspacelimit#comments</comments>
		<pubDate>Wed, 24 Mar 2010 01:19:05 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Good practice / Bad practice]]></category>
		<category><![CDATA[mariadb]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[relay-log-space-limit]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1201</guid>
		<description><![CDATA[We don&#8217;t often see this option configured (default: unlimited) but it might be a good idea to set it. What it does is limit the amount of disk space the combined relay logs are allowed to take up. A slave&#8217;s IO_Thread reads from the master and puts the events into the relay log; the slave&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>We don&#8217;t often see this option configured (default: unlimited) but it might be a good idea to set it. What it does is limit the amount of disk space the combined relay logs are allowed to take up.</p>
<p>A slave&#8217;s IO_Thread reads from the master and puts the events into the relay log; the slave&#8217;s SQL_Thread reads from the relay log and executes the query. If/when replication &#8220;breaks&#8221;, unless it&#8217;s connection related it tends to be during execution of a query. In that case the IO_Thread will keep running (receiving master events and storing in the relay log). Beyond some point, that doesn&#8217;t make sense.</p>
<p>The reason for having two separate replication threads (introduced in MySQL 4.0) is that long-running queries don&#8217;t delay receiving more data. That&#8217;s good. But receiving data is generally pretty fast, so as long as that basic issue is handled, it&#8217;s not necessary (for performance) to have the IO_Thread run ahead that far.</p>
<p>So you can set something like relay-log-space-limit=256M. This prevents slave disk space from getting gobbled up in some replication failure scenarios. The data will still be available in the logs on the master (provided of course the log expiration there isn&#8217;t too short &#8211; replication monitoring is still important!).</p>
<p>Conclusion: the relay log as a cache. Don&#8217;t leave it at &#8220;Unlimited&#8221;, that&#8217;s inefficient (and potentially problematic) use of resources. If you do run out of diskspace, the relay log can get corrupted &#8211; then you have to reposition, which will re-read the data from the master anyway.</p>
]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/relaylogspacelimit/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Know your data &#8211; and your numeric types.</title>
		<link>http://openquery.com/blog/data-numeric-types</link>
		<comments>http://openquery.com/blog/data-numeric-types#comments</comments>
		<pubDate>Wed, 23 Dec 2009 16:12:16 +0000</pubDate>
		<dc:creator>toby</dc:creator>
				<category><![CDATA[Good practice / Bad practice]]></category>
		<category><![CDATA[accuracy]]></category>
		<category><![CDATA[decimal]]></category>
		<category><![CDATA[double]]></category>
		<category><![CDATA[float]]></category>
		<category><![CDATA[integer]]></category>
		<category><![CDATA[latitude]]></category>
		<category><![CDATA[longitude]]></category>
		<category><![CDATA[mariadb]]></category>
		<category><![CDATA[modelling]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[numeric]]></category>
		<category><![CDATA[precision]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1158</guid>
		<description><![CDATA[Some guidelines to choosing between MySQL's numeric types, using longitude and latitude as a modelling example.]]></description>
			<content:encoded><![CDATA[<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Numeric types in MySQL have two varieties:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">- &#8220;precise&#8221; types such as INTEGER and DECIMAL;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">- the IEEE-standard floating point types FLOAT and DOUBLE.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">As a rule of thumb, the first group are for exact, &#8220;counted&#8221; quantities. The INTEGER types represent whole numbers, and DECIMAL represents &#8220;fixed point&#8221; decimal, with a preset number of places after the decimal point.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Various widths of INTEGER are available in MySQL, from 8-bit TINYINT to 64-bit BIGINT. Calculations with integer types are fast, as they usually correspond to hardware register sizes.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">DECIMAL is commonly used for quantities like decimal currency where the number of digits of precision is known and fixed. For example, exactly counting pennies in two decimal digits. Computation with DECIMAL is slower than other types, but this is unlikely to impact most applications.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">In the other category are FLOAT and DOUBLE, which are the 32 and 64-bit IEEE standard types, which are usually supported in hardware and are therefore fast and convenient for arithmetic. These are generally good choices for &#8220;measurements&#8221; &#8211; values with limited precision.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">It is important to understand what is meant by &#8220;precision&#8221; (Wikipedia has &lt;a href=&#8221;http://en.wikipedia.org/wiki/Arithmetic_precision&#8221;&gt;a full discussion&lt;/a&gt;). For example, I can measure my height at 185.8 centimetres. Because of the way I make the measurement, which we know to be approximate, this figure is understood have only one meaningful digit after the decimal point.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Precision is a property of all inexact real world &#8220;measurements&#8221; &#8211; such as position, length, weight, brightness; it is usually expressed as the number of &#8220;significant figures&#8221; or significant digits. (My height measurement has four significant digits.) This should be considered when values are displayed. It is somewhat misleading to represent my height as 185.8000 &#8211; common sense tells us that the value is not this accurate.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Serious problems can occur when we do not know the actual precision of measurements, if we wrongly assume a greater precision than exists. A typical example might be a GPS map display which uses measured position to locate the user relative to features such as rivers, roads, and railway tracks. The map display is high resolution and implies a great deal of precision to the viewer. Let us say that based on incoming data, it places our vehicle three metres East of a river. If the measurement has a true precision of, say, 20 metres, we cannot even know even which side of the river we are on! So to allow users to draw safe conclusions, the presentation of data needs to take precision into account.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">(While very frequently cited as examples of &#8220;precise&#8221; figures, not all monetary values are. For example, the value of &lt;a href=&#8221;http://twitter.com/nationaldeficit&#8221;&gt;USA&#8217;s national deficit&lt;/a&gt; was estimated today at $11,983,250,643,192.95. I am no economist, but rather few of these digits can be actually significant!)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Throwing away precision with inappropriate modelling is also a potential problem. Ideally measurements arrive with an implied or explicit precision. But even without that, we need to assure ourselves that we are safely storing them.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Keep in mind that we are always dealing with two kinds of precision:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">- Machine precision, which is defined by the chosen type; and</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">- Data precision, which is a property of the values we are storing.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">If I am storing my height as cm in a FLOAT column, the data precision is only 4 significant digits as discussed, but the machine precision of this column is always about 7 significant digits, no matter what we try to store. Clearly to avoid throwing away significant parts of your input, the machine precision should exceed your data precision.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Consider a &lt;a href=&#8221;http://geocoder.ibegin.com/downloads/canada_cities.zip&#8221;&gt;CSV file of latitude and longitude&lt;/a&gt;, and notice that every value has 11 digits after the point:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"><span style="white-space: pre;"> </span>Afton Station,NS,45.6051050000,-61.6974950000</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"><span style="white-space: pre;"> </span>Agassiz,BC,49.2421627750,-121.7496169988</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">This does not mean that every value has 14 digits of precision &#8211; if that were so, then these coordinates would be accurate to within 0.01mm at the equator! This is clearly not true.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Let&#8217;s say we merely want to stick markers on a national map. One hundred metre resolution would be more than adequate. Given an Earth circumference of 40,075,020 metres, 100 metres is approximately 1/1113 of a degree. While three digits (0.001) can represent 1/1000ths of a degree, this is not quite precise enough, so let&#8217;s keep four fractional digits after the point. Therefore we are looking to represent 7 significant figures, for example:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"><span style="white-space: pre;"> </span>Afton Station,NS,45.60511,-61.69750</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Which of FLOAT or DOUBLE is the right type to use for such values? Let&#8217;s investigate.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">A key difference between floating point types and fixed point representations (such as DECIMAL) is that while the overall binary precision is fixed, the precision of the fractional part of the value can vary! The available precision depends on magnitude of the value. The highest precision is available for values closest to zero, and precision gets worse as numbers increase in magnitude (by every power of two).</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">To understand the effect of this, it is necessary to examine floating point representation in more detail. (I am going to handwave more esoteric features of &lt;a href=&#8221;http://en.wikipedia.org/wiki/IEEE_754-1985&#8243;&gt;the IEEE standard&lt;/a&gt; and try to give a general picture.)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Floating point values have three parts:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">- a sign (+/-)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">- an &#8220;exponent&#8221; (binary scale factor)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">- the value itself (known as the fraction or &#8220;mantissa&#8221;).</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">These are analogous to decimal &#8220;scientific&#8221; or &#8220;exponential&#8221; notation that you may already be familiar with (e.g. 76.4935 = 7.64935 x 10^1 or 7.64935E+1, where 1 is the decimal exponent).</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">The combination of these fields precisely defines a rational number, whose value is desirably &#8220;close enough&#8221; to the value you need to store (which was imprecise to begin with, so the approximation involved in converting to floating point representation is normally not a problem).</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">The overall precision available is determined by the bits allowed to store the fraction. For reference,</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">- FLOAT allows 23 bits for the fraction</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">- DOUBLE allows 52 bits for the fraction</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Respectively, this means about 7 and 16 decimal digits in total. So it appears that FLOAT is probably adequate for our seven digit needs.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">To make sure, let&#8217;s work backwards and confirm just how precisely a FLOAT value can represent a latitude. To do this I will show how an example value is converted into the FLOAT representation. None of the values in our table will have latitudes greater than 77 degrees, so I will pick 76.4935. (Higher values have larger exponents and hence the least available precision for the fractional part, so are the safest test.)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">First we need to determine the correct exponent for the value. Then we can work out the real-world &#8220;resolution&#8221; of the number, i.e. how much the actual value changes if we change the fraction by the smallest possible amount (i.e., in its least significant bit).</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">The exponent is the largest power of 2 that divides the value, to &#8220;normalise&#8221; it into the range 0..1. A glance shows us that for 76, a divisor of 128 (2^7) is the right one. That is, the floating point exponent is 7. And the resulting fraction is 76.4935 / 128, or in decimal, 0.59760546875.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Remembering that the binary fraction portion has 23 bits, let&#8217;s examine its &#8220;binary expansion&#8221;. This is effectively just the result of 0.59760546875 x 2^23, written out in base 2:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">1001100 . 0111111001010110  (total 23 bits)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">^^^^^^^   ^^^^^^^^^^^^^^^^</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">whole #   fraction part</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">=    76 . 4935 (approx)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Note that, because the exponent is 7, the first 7 bits make up the whole number part (= binary 1001100 = 76). I&#8217;ve put a gap where the &#8220;binary point&#8221; belongs. Written out like this, we can see that we have 16 bits to the right of this point. 2^16 = 65536; so, around this magnitude of 76 degrees (and up to 128, as at 128 the binary exponent increases to 8), we can resolve to 1/65536th of a degree. This is enough bits for four decimal digits (which only requires 1/10000th).</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">So exactly how precise on the ground will this be?</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">1/65536th of a degree of latitude is about 1.7 metres: much better than our hoped for resolution of 100 metres. So we have shown that FLOAT is more than adequate for the job.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">The same analysis can be done for DOUBLE, of course. For interest&#8217;s sake, the equivalent binary expansion is:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">1001100 . 011111100101011000000100000110001001001101110  (total 52 bits)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">We have 29 more bits to play with, or a total of 45 fraction bits after the whole number part, at this magnitude. This is ridiculously precise, and can resolve 1/35184372088832th of a degree; or 0.00114mm on the surface of the globe. (This is enough to represent 13 decimal digits after the point.)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">This example has shown how knowing a little of how floating point works can help you be confident about issues of precision, when choosing types to represent approximate values, or measurements. The key is to know your data, and understand how much precision you have, and how much your application needs.</div>
<p>In this  &#8221;Good Practice/Bad Practice&#8221; post I hope to give some guidelines to choosing between MySQL&#8217;s numeric types, using longitude and latitude as a modelling example. (Disclaimer: I am not a mathematician, and the generalisations here are meant to help with practical modelling questions rather than be rigorously theoretical.)</p>
<p>Numeric types in MySQL fall into two main varieties:</p>
<ul>
<li>&#8220;precise&#8221; types such as INTEGER and DECIMAL;</li>
<li>the IEEE-standard floating point types FLOAT and DOUBLE.</li>
</ul>
<p>As a rule of thumb, the first group are for <strong>exact, or &#8220;counted&#8221; quantities.</strong> The INTEGER types represent whole numbers, and DECIMAL represents &#8220;fixed point&#8221; decimal, with a preset number of places after the decimal point.</p>
<p>Various widths of INTEGER are available in MySQL, from 8-bit TINYINT to 64-bit BIGINT. Calculations with integer types are fast, as they usually correspond to hardware register sizes.</p>
<p>DECIMAL is commonly used for quantities like decimal currency where the number of digits of precision is known and fixed. For example, exactly counting pennies in two decimal digits. Computation with DECIMAL is slower than other types, but this is unlikely to impact most applications.</p>
<p>In the other category are FLOAT and DOUBLE, which are the 32 and 64-bit IEEE standard types, which are usually supported in hardware and are therefore fast and convenient for arithmetic. These are generally good choices for <strong>&#8220;measurements&#8221; &#8211; values with limited precision.</strong></p>
<p>It is important to understand what is meant by &#8220;precision&#8221; (Wikipedia has <a href="http://en.wikipedia.org/wiki/Arithmetic_precision">a full discussion</a>). For example, I can measure my height at 185.8 centimetres. Because of the way I make the measurement, which we know to be approximate, this figure is understood have only one meaningful digit after the decimal point.</p>
<p>Precision is a property of all inexact real world &#8220;measurements&#8221; &#8211; such as position, length, weight, brightness; it is usually expressed as the number of &#8220;significant figures&#8221; or significant digits. (My height measurement has four significant digits.) This should be considered when values are displayed. It is somewhat misleading to represent my height as 185.8000 &#8211; common sense tells us that the value is not this accurate.</p>
<p>Serious problems can occur when we do not know the actual precision of measurements, if we wrongly assume a greater precision than exists. A typical example might be a GPS map display which uses measured position to locate the user relative to features such as rivers, roads, and railway tracks. The map display is high resolution and implies a great deal of precision to the viewer. Let us say that based on incoming data, it places our vehicle three metres East of a river. If the measurement has a true precision of, say, 20 metres, we cannot even know even which side of the river we are on! So to allow users to draw safe conclusions, the presentation of data needs to take precision into account.</p>
<p>(While very frequently cited as examples of &#8220;precise&#8221; figures, not all monetary values are. For example, the value of <a href="http://twitter.com/nationaldeficit">USA&#8217;s national deficit</a> was estimated today at $11,983,250,643,192.95. I am no economist, but rather few of these digits can be actually significant!)</p>
<p>Throwing away precision with inappropriate modelling is also a potential problem. Ideally measurements arrive with an implied or explicit precision. But even without that, we need to assure ourselves that we are safely storing them.</p>
<p>Keep in mind that we are always dealing with two kinds of precision:</p>
<ul>
<li>Machine precision, which is defined by the chosen type; and</li>
<li>Data precision, which is a property of the values we are storing.</li>
</ul>
<p>If I am storing my height as cm in a FLOAT column, the data precision is only 4 significant digits as discussed, but the machine precision of this column is always about 7 significant digits, no matter what we try to store. Clearly to avoid throwing away significant parts of your input, the machine precision should exceed your data precision.</p>
<p>Consider a <a href="http://geocoder.ibegin.com/downloads/canada_cities.zip">CSV file of latitude and longitude</a>, and notice that every value has 10 digits after the point:</p>
<pre><span style="white-space: pre;"> </span>Afton Station,NS,45.6051050000,-61.6974950000</pre>
<pre><span style="white-space: pre;"> </span>Agassiz,BC,49.2421627750,-121.7496169988</pre>
<p>This does not mean that every value has 13 digits of precision &#8211; if that were so, then these coordinates would be accurate to within 4mm on the ground! This is clearly not possible.</p>
<p>Let&#8217;s say we merely want to stick markers on a web page showing a national map. One hundred metre resolution would be more than adequate. Given an Earth circumference of 40,075,020 metres, 100 metres is approximately 1/1113 of a degree. While three digits (0.001) can represent 1/1000ths of a degree, this is not quite precise enough, so let&#8217;s keep four fractional digits after the point. Therefore we are looking to represent 7 significant figures, for example:</p>
<pre><span style="white-space: pre;"> </span>Afton Station,NS,45.60511,-61.69750</pre>
<p>Which of FLOAT or DOUBLE is the right type to use for such values? Let&#8217;s investigate.</p>
<p>A key difference between floating point types and fixed point representations (such as DECIMAL) is that while the overall binary precision is fixed, <strong>the precision of the fractional part of the value can vary!</strong> The available precision depends on magnitude of the value. The highest precision is available for values closest to zero, and precision gets worse as numbers increase in magnitude (by every power of two).</p>
<p>To understand the effect of this, it is necessary to examine floating point representation in more detail. (I am going to handwave more esoteric features of <a href="http://en.wikipedia.org/wiki/IEEE_754-1985">the IEEE standard</a> and try to give a general picture. In particular I am not going to talk about rounding, biased exponents, or denormalised numbers.)</p>
<p>Floating point values have three parts:</p>
<ul>
<li>a sign (+/-)</li>
<li>an &#8220;exponent&#8221; (binary scale factor)</li>
<li>the value itself (known as the fraction or &#8220;mantissa&#8221;).</li>
</ul>
<p>These are analogous to decimal &#8220;scientific&#8221; or &#8220;exponential&#8221; notation that you may already be familiar with (e.g. 76.4935 = 7.64935 x 10^1 or 7.64935E+1, where 1 is the decimal exponent).</p>
<p>The combination of these fields precisely defines a rational number, whose value is desirably &#8220;close enough&#8221; to the value you need to store (which was imprecise to begin with, so the approximation involved in converting to floating point representation is normally not a problem).</p>
<p>The overall precision available is determined by the bits allowed to store the fraction. For reference,</p>
<ul>
<li>FLOAT allows 23 physical bits for the fraction (with hidden bit, effectively 24 bits of fraction)</li>
<li>DOUBLE allows 52 physical bits for the fraction (with hidden bit, effectively 53)</li>
</ul>
<p>Respectively, this is 7 and 15 precise decimal digits in the whole figure. So FLOAT probably meets our 7 digit requirement.</p>
<p>To make sure, let&#8217;s work backwards and confirm just how precisely a FLOAT value can represent a latitude. To do this I will show how an example value is converted into the FLOAT representation. None of the values in our table will have latitudes greater than 77 degrees, so I will pick 76.4935. (Higher values have larger exponents and hence the least available precision for the fractional part, so are the safest test.)</p>
<p>First we need to determine the correct exponent for the value. Then we can work out the real-world &#8220;resolution&#8221; of the number, i.e. how much the actual value changes if we change the fraction by the smallest possible amount (i.e., in its least significant bit).</p>
<p>The exponent is the largest power of 2 that divides the value, to &#8220;normalise&#8221; it into a value between 0 and 1. Since our starting value is more than 1, we need to look at divisors among the powers of 2, which are greater than one: 1, 2, 4, 8, 16, 32, 64, 128, 256&#8230; A glance at this list shows us that for 76, a divisor of 128 (2^7) is the right one; i.e., the floating point exponent is 7. And the resulting fraction part is 76.4935 / 128, or in decimal, 0.59760546875.</p>
<p>(Normalisation works similarly for starting values less than one, but in the other direction. We multiply by the largest power of 2 that leaves the value less than one, and store the negative exponent. Negative values are simply dealt with by converting to positive before normalising, and noting a negative sign.)</p>
<p>Since <strong>all non-zero positive numbers begin with binary &#8217;1&#8242;,</strong> IEEE representation cleverly implies this &#8220;hidden&#8221; 1 bit, and doesn&#8217;t physically store it. This frees up one more bit for the fraction, i.e. giving a total of 24 bits for FLOAT precision. (Because of this, and &#8220;exponent biasing&#8221;, the binary sequence shown isn&#8217;t the <em>actual</em> IEEE bit pattern representing this number.)</p>
<p>Assuming a fraction of 24 bits, let&#8217;s examine its &#8220;binary expansion&#8221;. This is effectively just the result of 0.59760546875 x 2^24, written out in base 2:</p>
<pre>1001100 . 01111110010101100  (total 24 bits including "hidden" bit)</pre>
<pre>^^^^^^^   ^^^^^^^^^^^^^^^^^</pre>
<pre>whole #   fraction part</pre>
<pre>=    76 . 4935 (approx)</pre>
<p>Because the exponent is 7, the first 7 bits in my binary sequence above are the whole number part (= binary 1001100 = 76). I&#8217;ve put a gap where the &#8220;binary point&#8221; belongs. Written out like this, we can see that we have 17 bits to the right of this point. 2^17 = 131072; so, around this magnitude of 76 degrees (and up to 128, as at 128 the binary exponent increases to 8), we can resolve no worse than 1/131072th of a degree. This is enough bits for five decimal digits (which only requires 1/100000th).</p>
<p>So how precise on the ground will this be?</p>
<p>1/131072th of a degree of latitude is about 0.85 metres: much better than our hoped for resolution of 100 metres. So we have shown that FLOAT is more than adequate for the job. This is not to say that FLOAT is the correct choice for <strong>all</strong> geographic uses; only that it is adequate for this use, where we decided 100m resolution was enough. On the other hand, source data for geocoding may need more precision than FLOAT can deliver. (Naturally the extra precision must be present in the source data. Simply using a more precise type cannot add any precision to the original measurement, of course <img src='http://openquery.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>The analysis above can be done for DOUBLE, of course. For interest&#8217;s sake, the equivalent binary expansion is:</p>
<pre style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;">1001100 . 0111111001010110000001000001100010010011011101  (total 53 bits including "hidden" bit)</pre>
<p>We have 29 more bits to play with, or a total of 46 fraction bits after the whole number part, at this magnitude. This is ridiculously precise, and can resolve no worse than 1/70368744177664th of a degree; or 0.0000015mm on the surface of the globe. (This is enough to represent 13 decimal digits after the point.)</p>
<p>This example has shown how knowing a little of how floating point works can help you be confident about issues of precision, when choosing types to represent approximate values, or measurements &#8211; rather than automatically falling back on DOUBLE or even DECIMAL as a &#8220;paranoid default&#8221;. The key is to know your data, and understand how much precision you have, and how much your application needs.</p>
]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/data-numeric-types/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Trivia: identify this replication failure</title>
		<link>http://openquery.com/blog/trivia-identify-replication-failure</link>
		<comments>http://openquery.com/blog/trivia-identify-replication-failure#comments</comments>
		<pubDate>Wed, 28 Oct 2009 08:09:54 +0000</pubDate>
		<dc:creator>arjen</dc:creator>
				<category><![CDATA[Good practice / Bad practice]]></category>
		<category><![CDATA[master]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[open query]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[slave]]></category>
		<category><![CDATA[trivia]]></category>

		<guid isPermaLink="false">http://openquery.com/blog/?p=1087</guid>
		<description><![CDATA[We got good responses to the &#8220;identify this query profile&#8221; question. Indeed it indicates an SQL injection attack. Obviously a code problem, but you must also think about &#8220;what can we do right now to stop this&#8221;. See the responses and my last note on it below the original post. Got a new one for [...]]]></description>
			<content:encoded><![CDATA[<p>We got good responses to the &#8220;<a href="http://openquery.com/blog/trivia-identify-query-profile">identify this query profile</a>&#8221; question. Indeed it indicates an SQL injection attack. Obviously a code problem, but you must also think about &#8220;what can we do right now to stop this&#8221;. See the responses and my last note on it below the original post.</p>
<p><strong>Got a new one for you!</strong></p>
<p>You find a system with broken replication, could be a slave or one in a dual master setup. the IO thread is still running. but the SQL thread is not and the last error is (yes the error string is exactly this, very long &#8211; sorry I did not paste this string into the original post &#8211; updated later):</p>
<blockquote><p><em>&#8220;Could not parse relay log event entry. The possible reasons are: the master&#8217;s binary log is corrupted (you can check this by running &#8216;mysqlbinlog&#8217; on the binary log), the slave&#8217;s relay log is corrupted (you can check this by running &#8216;mysqlbinlog&#8217; on the relay log), a network problem, or a bug in the master&#8217;s or slave&#8217;s MySQL code. If you want to check the master&#8217;s binary log or slave&#8217;s relay log, you will be able to know their names by issuing &#8216;SHOW SLAVE STATUS&#8217; on this slave.&#8221;</em></p></blockquote>
<p>In other similar cases the error message is about something else but the query it shows with it makes no sense. To me, that essentially says the same as the above.</p>
<p>The server appears to have been restarted recently.</p>
<p>What&#8217;s wrong, and what&#8217;s your quickest way to get replication going again given this state?</p>
]]></content:encoded>
			<wfw:commentRss>http://openquery.com/blog/trivia-identify-replication-failure/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->
