<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Trivia: identify this replication failure</title>
	<atom:link href="http://openquery.com/blog/trivia-identify-replication-failure/feed" rel="self" type="application/rss+xml" />
	<link>http://openquery.com/blog/trivia-identify-replication-failure</link>
	<description>About MySQL, Drizzle, MariaDB and more!</description>
	<lastBuildDate>Mon, 19 Mar 2012 14:26:12 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Ryan</title>
		<link>http://openquery.com/blog/trivia-identify-replication-failure/comment-page-1#comment-2145</link>
		<dc:creator>Ryan</dc:creator>
		<pubDate>Wed, 28 Oct 2009 17:49:04 +0000</pubDate>
		<guid isPermaLink="false">http://openquery.com/blog/?p=1087#comment-2145</guid>
		<description>I agree with Shlomi&#039;s example as the most likely cause.  However I&#039;ve also seen the same behaviour when a router was silently garbling packets -- with enough frequency that  some would pass through the crude TCP checksum and produce invalid SQL.  Solution was the same though.</description>
		<content:encoded><![CDATA[<p>I agree with Shlomi&#8217;s example as the most likely cause.  However I&#8217;ve also seen the same behaviour when a router was silently garbling packets &#8212; with enough frequency that  some would pass through the crude TCP checksum and produce invalid SQL.  Solution was the same though.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Schneller</title>
		<link>http://openquery.com/blog/trivia-identify-replication-failure/comment-page-1#comment-2142</link>
		<dc:creator>Daniel Schneller</dc:creator>
		<pubDate>Wed, 28 Oct 2009 10:11:21 +0000</pubDate>
		<guid isPermaLink="false">http://openquery.com/blog/?p=1087#comment-2142</guid>
		<description>Writing binlogs and the actual table data is not atomic. With InnoDB, the binlog is only written after the transaction has been committed. When the server (is made to) crash when it&#039;s writing the binlogs it might leave them unfinished, i. e. corrupt, while InnoDB rolls back the transaction.

The sync_binlog and innodb_support_xa setting should prevent this, however there can still be problems with too short binlogs.

To repair this there are several options. If there is only a small to moderate amount of data, using a fresh dump might be a solution. If this takes too long or is otherwise impractical, you need to find out up there where the slave executed correctly - the SHOW SLAVE STATUS command helps here. Then look at the master&#039;s corresponding binlog and either set the skip_slave_counter appropriately (if the statement was executed correctly after the crash) or better reset the replication with CHANGE MASTER TO and point it to the new binlog the master would have started when it was brought back up again.

Anyway maybe a mk-table-checksum based verification would be advisable to see if you actually got it right :-)</description>
		<content:encoded><![CDATA[<p>Writing binlogs and the actual table data is not atomic. With InnoDB, the binlog is only written after the transaction has been committed. When the server (is made to) crash when it&#8217;s writing the binlogs it might leave them unfinished, i. e. corrupt, while InnoDB rolls back the transaction.</p>
<p>The sync_binlog and innodb_support_xa setting should prevent this, however there can still be problems with too short binlogs.</p>
<p>To repair this there are several options. If there is only a small to moderate amount of data, using a fresh dump might be a solution. If this takes too long or is otherwise impractical, you need to find out up there where the slave executed correctly &#8211; the SHOW SLAVE STATUS command helps here. Then look at the master&#8217;s corresponding binlog and either set the skip_slave_counter appropriately (if the statement was executed correctly after the crash) or better reset the replication with CHANGE MASTER TO and point it to the new binlog the master would have started when it was brought back up again.</p>
<p>Anyway maybe a mk-table-checksum based verification would be advisable to see if you actually got it right <img src='http://openquery.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: hartmut</title>
		<link>http://openquery.com/blog/trivia-identify-replication-failure/comment-page-1#comment-2140</link>
		<dc:creator>hartmut</dc:creator>
		<pubDate>Wed, 28 Oct 2009 09:19:59 +0000</pubDate>
		<guid isPermaLink="false">http://openquery.com/blog/?p=1087#comment-2140</guid>
		<description>replication event checksums anyone? :(</description>
		<content:encoded><![CDATA[<p>replication event checksums anyone? <img src='http://openquery.com/blog/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul</title>
		<link>http://openquery.com/blog/trivia-identify-replication-failure/comment-page-1#comment-2138</link>
		<dc:creator>Paul</dc:creator>
		<pubDate>Wed, 28 Oct 2009 08:53:58 +0000</pubDate>
		<guid isPermaLink="false">http://openquery.com/blog/?p=1087#comment-2138</guid>
		<description>Without knowing the exact error its not possible to make a real decision, however its highly likely the master DB has crashed and binlog has become corrupted.  However the slave DB has become stuck reading to the end of the binlog file of the master with an &quot;impossible position&quot; type error.

Thus the quickest way to get the slave working again  is to get the new master DB&#039;s bin log name  from column &quot;Master _Log_file&quot; using query;

SLAVE&gt; SHOW SLAVE STATUS;

Then change replication position with;

SLAVE&gt; STOP SLAVE; -- to stop IO thread
SLAVE&gt; CHANGE MASTER TO master_host=&#039;&#039;, Master_user=&#039;&#039;, Master_password = &#039;&#039;, Master_log_file = &#039;&#039;, Matser_log_pos=0;
SLAVE&gt; START SLAVE;

This doesn&#039;t ensure data integrity, but as mentioned about you need to know the nature of error to statt making decisions to ensure data integrity.</description>
		<content:encoded><![CDATA[<p>Without knowing the exact error its not possible to make a real decision, however its highly likely the master DB has crashed and binlog has become corrupted.  However the slave DB has become stuck reading to the end of the binlog file of the master with an &#8220;impossible position&#8221; type error.</p>
<p>Thus the quickest way to get the slave working again  is to get the new master DB&#8217;s bin log name  from column &#8220;Master _Log_file&#8221; using query;</p>
<p>SLAVE&gt; SHOW SLAVE STATUS;</p>
<p>Then change replication position with;</p>
<p>SLAVE&gt; STOP SLAVE; &#8212; to stop IO thread<br />
SLAVE&gt; CHANGE MASTER TO master_host=&#8221;, Master_user=&#8221;, Master_password = &#8221;, Master_log_file = &#8221;, Matser_log_pos=0;<br />
SLAVE&gt; START SLAVE;</p>
<p>This doesn&#8217;t ensure data integrity, but as mentioned about you need to know the nature of error to statt making decisions to ensure data integrity.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shlomi Noach</title>
		<link>http://openquery.com/blog/trivia-identify-replication-failure/comment-page-1#comment-2137</link>
		<dc:creator>Shlomi Noach</dc:creator>
		<pubDate>Wed, 28 Oct 2009 08:51:47 +0000</pubDate>
		<guid isPermaLink="false">http://openquery.com/blog/?p=1087#comment-2137</guid>
		<description>What&#039;s wrong: the relay log file is corrupted. e.g. server restarted as it was being written.
Possible solution: check otu the master lgos file/position in &quot;show slave status&quot; (not the relay log one), and try &quot;stop slave; change master to ...; start slave&quot;, thus reading from that same position again. The &quot;change master to&quot; will discard the existing relay log(s).</description>
		<content:encoded><![CDATA[<p>What&#8217;s wrong: the relay log file is corrupted. e.g. server restarted as it was being written.<br />
Possible solution: check otu the master lgos file/position in &#8220;show slave status&#8221; (not the relay log one), and try &#8220;stop slave; change master to &#8230;; start slave&#8221;, thus reading from that same position again. The &#8220;change master to&#8221; will discard the existing relay log(s).</p>
]]></content:encoded>
	</item>
</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->
