Baron Schwartz
How Percona Toolkit divides tables into chunks
The tools we’ve redesigned in Percona Toolkit recently have moved away from a legacy technique for operating on small numbers of rows at a time, towards a more reliable and predictable method. We call the old version “chunking” and the new version “nibbling.” Many other MySQL tools I’ve seen either operate on entire tables, or use the “chunking” technique and are exposed to the problems it creates. I’ll compare the two briefly to explain the differences.
Chunking attempts to divide a table into ranges of rows of a desired size, such as 1000 rows. It does this by examining the minimum and maximum value of the primary key (or other suitable index), estimating the number of rows in the table, and dividing one by the other to create a list of boundary values. Suppose that the minimum value is 1 and the maximum is 1000000, and there are an estimated 100000 rows in the table. The chunk boundaries will fall on intervals of 10000. We can operate on 1-10000, 10001-20000, and so on.
This has a number of problems that might not be obvious at first. It practically requires a numeric, single-column index[1]. It can (and does, in practice) create oversized chunks when values are sparse in some places and packed tightly in others. It leads to a lot of code complexity and bugs when the table’s data changes (especially if new rows are inserted) as the tool works. It has edge cases related to special values such as 0, NULL, the date ’0000-00-00′, the beginning of the table, and the end of the table. In short, through years of experience we found that it simply doesn’t work well enough. It works in 95% of cases, but not all that well, and in a small fraction of cases it causes serious problems, such as a massively oversized chunk that interferes with other processes on the server. If you’d like more details on these types of problems, there is a lot of information in old bug reports.
The new “nibbling” technique we are using is actually not new at all. It is something I learned before I was even involved in MySQL very much. The context is in some old blog posts I wrote about archiving and purging. The idea is to fetch a row and use that as the lower bound of the first chunk (or “nibble” — we use the terms pretty interchangeably) of rows. Then use a SELECT with a LIMIT to find the upper boundary of the nibble, as well as the lower bound of the next nibble. After processing the nibble, repeat the steps with the lower bound of the next nibble. The disadvantage of this process is that there is quite a bit of code complexity when you get multi-column indexes, with NULL-able columns adding a whole new twist to the game. However, this is long since solved, and we have a reliable and well-tested library of routines for dealing with this. In return we get predictable behavior on practically any table with an index, even if it isn’t a unique index. The only remaining edge case is when a non-unique index contains a range of identical values that is larger than the desired chunk size, but that is easily detected and handled in application-specific ways.
The result of the work we’ve been doing recently, replacing “chunking” with “nibbling” in pt-table-checksum and pt-online-schema-change, is reliable and safe nibbling behavior. This has the further benefit of allowing us to do much more sophisticated techniques, such as dynamically varying the size of each chunk of rows in response to changing conditions such as server load or varying row size and complexity. Our recent tools are designed to target a predictable query time for each nibble, rather than a specific number of rows. I am happy to report that, in extensive real-world usage, they are able to stay extremely close to the target time even as conditions vary dramatically.
[1] Although in theory you can operate on the first column in a multi-column index, real-world experience shows that the first column in such indexes tends to have low cardinality, thus creating enormous chunks. And although it is possible to treat date, timestamp, and some other types as numeric, in practice it is very difficult. What number corresponds to 0000-00-00? Our attempts to create algorithms that would work on non-numeric types such as character-based columns were tremendously difficult and the results were discouraging; again, in the real world it doesn’t work well.
Further Reading:
The first guest post on Xaprb blog
I got a really intriguing email today. As you might imagine, I get a lot of email offering to advertise, or trade links, or guest post, etc etc. But this is the most compelling one I’ve ever gotten:
I came across your blog http://www.xaprb.com/blog/2009/03/13/50-things-to-know-before-migrating-oracle-to-mysql/ a few weeks ago as while conducting research for a website that I contribute to. The website aims to look at the progressive areas of early childhood psychology. The idea was originally to create an objective collection of resources for people interested in psychology but it has since grown into something much wider (the resources come together at [redacted]). Today, the project looks at important factors in childhood development that pay dividends in the future.
I think a guest blog post that illuminates the the advancement of our understanding of childhood development would be interesting for your audience. If you’re interested, I would love to write something for you and perhaps start a friendly dialogue. What do you think?
Thanks, and I look forward to hearing from you.
All my best,
Sarah Thompson (sarahjthompson5@gmail.com)
The thing is, I know someone who’s been on the Oprah show and written many, many bestselling books on child psychology, so this perked up my ears right away. The more I thought about it, the more I realized that this is an incredible value to my readers. Everyone interested in migrating between database servers ought to know about childhood psychology. I plan to say yes and to start an entire series of blog posts on this and other highly relevant topics, such as the mating behavior of slugs, the history of central Asia, air pressure in a vacuum, and how many disciples Jesus really had. Look forward to the first installment shortly.
Further Reading:
Free webinar: non-blocking MySQL schema changes
Please join me tomorrow (Wednesday) for a webinar about the new version of pt-online-schema-change, Percona Toolkit’s tool for performing nonblocking ALTER TABLE operations. Registration is free. I will explain how the tool works and cover some typical use scenarios. This might be helpful if you’re trying to decide whether the tool is right for you. I am also planning to leave plenty of time for questions!
Further Reading:
Interest building in Percona XtraDB Cluster
In the last few weeks I’ve been caught off guard by the number of people who’ve told me they have been evaluating Percona XtraDB Cluster (link), and even more surprised at the projects they’re considering it for. Yesterday alone I spoke to several people who have been evaluating it for large, mission-critical enterprise deployments. Some new, some to replace existing systems that use standard MySQL replication. What was interesting is that some people said they’ve been putting it through its paces for months — before we even released it as GA.
Another person said he was evaluating it and tried a bunch of things like killing nodes, and it “just worked.” He sounded like he’d been suspicious: had it REALLY worked? But then, on further investigation, he was able to confirm that yes, it had just worked. The node went away; the cluster as a whole was healthy and happy.
It’s funny how you can get a feeling about the momentum on a product or idea or event. It probably builds upon the little things, like tone of voice or expressions on peoples’ faces. In any case I have this gut feeling about Percona XtraDB Cluster: it’s about to happen.
Further Reading:
The MySQL init-script mess
I don’t think there is a single good-quality MySQL init script for a Unix-like operating system. On Windows, there is the service facility, and I used to write Windows services. Based on that, I believe Windows has a pretty good claim to better reliability for start/stop services with MySQL.
What’s wrong with the init scripts? Well, let me count the reasons! Wait, I’m out of fingers and toes.
I’ll just mention the two annoying ones that I’ve run into most recently. Both are on Debian, though there is nothing especially broken about Debian’s init scripts. The first one comes from parsing my.cnf wrong and not treating pid-file and pid_file identically. The server treats them identically, thus, so should any other program that reads the my.cnf file (there’s this program called my_print_defaults… use it!). The second bug is because Debian uses two configuration files for start/stop services: the init script reads /etc/mysql/debian.cnf for no discernable reason. (I guess they never heard of using [sections] in the /etc/mysql/my.cnf file, or just reading the [mysqld] section.) So if you configure your server to place its socket in a non-default location, you have to redundantly update /etc/mysql/debian.cnf too, or the init script will fail. Duplication of configuration parameters is just stupid, period.
These are fairly mundane bugs. I’ve seen literally dozens more. Part of the problem is that each distribution that packages up and redistributes MySQL tends to ship with their own init script, instead of reusing the official scripts provided by MySQL. Understandable, because mysqld_safe is generic and doesn’t really integrate well with any system’s init facilities. But man, do they reinvent a bunch of lovely bugs, mostly related to things like parsing the .cnf files, handling pid files, handling sockets, special user accounts, braindead look-before-you-leap patterns of pinging before actually doing a task, stupid timeouts, wrong handling of log files and log file rotation, dumb hacks with syslog, failing to check for real evidence of a running process (you can’t trust what a cache file on disk says!), adding facepalm-worthy CHECK TABLES automatically on every table on server startup, and on and on.
The official mysqld_safe script tends to be a little less broken, in my experience, but still has many unlovely behaviors and missing features that I’d consider to be bugs.
I haven’t even mentioned the “manage multiple instances” scripts yet. Boy, do those have a ton of bugs. They do stupid things like grepping configuration files for strings that may or may not be in the configuration files. I remember one emergency case where MySQL couldn’t be started on a box because the string “mysql_multi” didn’t exist in a my.cnf file clearly designed for multiple instances to run. I added a comment to the effect of “# This comment is necessary for mysql_multi to work” and the problem was solved. A sane script would actually check for multiple instance definitions, not for some arbitrary string of characters. Anyway, this is just one tiny example, I don’t mean to dwell on it.
What happens when you have a bad init script? All kinds of things. You can’t shut down the server gracefully, so if you shut down the system, you hard-crash MySQL eventually, and good luck getting replication back after that in most cases. You can’t start the server correctly, or it reports the wrong thing and then tries to start several instances, and the second one borks the first one’s pid file and/or socket, causing the aforementioned shutdown problem or worse. And on it goes.
My principle is usually “don’t complain, do something about it.” But there’s a problem, in this case: writing a good init script is actually a significantly complex software engineering project. It is NOT “just a script.” (Insert my usual rant about the need for an actual test suite.) And that is not something I am working on at the moment, nor has it ever become my priority for the last several years. So in this case I’m complaining, because the writing on the wall says that I am probably never going to work on this, and I’d at least like there to be some visibility about what a serious problem this is.
Distribution maintainers could probably improve the situation significantly by taking a look at each other’s bug reports. If everyone solved the same bugs everyone else has solved (and don’t forget bugs in mysqld_safe, too) that would be a big step forward.
Further Reading:
A post-conference tradition
It’s nice that this happened again this year:
The question I ask myself every year is, When am I going to get time to study all of these great blog posts?
Further Reading:
Percona Toolkit gripes welcome
What are your pet peeves about the tools? I have a lot. For example, pt-table-sync doesn’t let me do things I want it to do sometimes (and I know it could be made to do them). Another example: pt-query-digest’s tcpdump parser doesn’t pay attention to TCP sequence numbers, so sometimes it invents a really long-running query where there isn’t one. Post your gripes, big and small, in the comments!
(This blog post is an attempt to do a 37signals approach to prioritizing: the stuff people complain about repeatedly is what you should focus on. No promises about fixing the gripes — I just want to hear them!)
Further Reading:
Another great MySQL 5.6 milestone release
Oracle has released another huge batch of improvements to MySQL slated for MySQL 5.6. The pace of innovation in MySQL is accelerating rapidly and the improvements are impressive. I have not read all of the blog posts thoroughly yet, but I’ve skimmed and it looks fantastic.
The link posted above is ONE of the sets of improvements: there are also a bunch of other posts about new improvements to the query optimizer, InnoDB, and probably more. But I don’t see a single table-of-contents blog post for those that I can link to. Hence I recommend that you look at Planet MySQL to find these posts.
Further Reading:
Automated, integrated sharding — the new killer database feature
MySQL became wildly successful in part because it had built-in, simple replication. Sure, it had lots of interesting failure scenarios and was not great at first — it is much better these days — but it was nevertheless successful because there was a single, out-of-the-box, not-very-complex way to do replication. I have opined many times before that this was one of the killer features missing from PostgreSQL. I think that can large explain why MySQL became more popular more quickly.
The new killer feature is automatic sharding, in my opinion. If you’re not accustomed to the word, “sharding” means partitioning of a large dataset across many servers.
It is easy to poke fun at MongoDB’s current limitations, but for all that, it has a story to tell about sharding. There is One Right Way To Do It in MongoDB, and it’s a part of the product.
I don’t see sharding being added into the core of MySQL itself, but there are some very interesting efforts headed towards MySQL. There are at least the following companies providing sharding via a proxy or middleware solution, with a lot of other features also available in some products:
In addition, there are community-based efforts, such as Shard-Query and the Spider storage engine. And there’s MySQL (NDB) Cluster, and commercial rip-out-and-plug-in replacements for MySQL such as Clustrix.
Am I missing any? I probably am. You can see and talk to many of these companies at this week’s MySQL conference, by the way.
Further Reading:
Sessions I want to see at MySQL conference 2012
In case you live under a rock, the MySQL conference starts on Tuesday. There are 8 concurrent tracks of content. Aside from my own talks, here are the sessions I would most like to see.
On Tuesday:
- Tutorial: Innodb and XtraDB Architecture and Performance Optimization. Peter’s talks are always great, and this is a perennial favorite. If you have never been, you should go to this one. You will realize how little you really know about InnoDB/XtraDB, and how much it matters.
- Tutorial: Linux and H/W optimizations for MySQL. Last year I tweeted that Yoshinori’s tutorial was the 3rd edition of High Performance MySQL, and I really wasn’t exaggerating that much.
- BoF session: Percona XtraDB Cluster. I believe that this is one of the few truly groundbreaking changes to MySQL — right up there with “MySQL has transactions now” and “MySQL adds replication.”
On Wednesday:
- MySQL Plugins – why should I bother? Sergei is the co-author of a book on plugins. Their power is not to be underestimated.
- Getting InnoDB Compression Ready for Facebook Scale. Compression is very powerful — if we can just get the implementation to work better. Facebook, apparently, has done that.
- The 5 minute DBA: MySQL DBA 101 for Non DBA’s. Get started fast with Matt Yonkovit’s friendly, engaging and relevant talk. Spoiler: it’s longer than five minutes.
- MySQL Optimizer Standoff MySQL 5.6 and MariaDB 5.3. Peter at it again, this time digging into the advances in newer releases of the server, which make a huge difference on some common and important workloads.
- Scaling MySQL Databases for the Web. This is the “Vitess” talk, a technology I’m extremely keen to hear about. It’s now powering YouTube’s backend.
- BoF session: Tungsten User Fest. Tungsten has matured into a very interesting technology in the last couple of years.
On Thursday:
- Oracle to MySQL migration. Marco Tusa has actually implemented some of the most impressive migrations in MySQL’s history.
- Spil Games: Outgrowing an internet startup. This should be a great talk about running MySQL in a busy startup in the real world.
- Here I’m torn between MySQL and SSD: usage and tuning and Common Schema: a framework for MySQL server administration. (I’m always torn between at least 2 or 3 talks, but in this case especially so for some reason.)
- Verifying MySQL Replication Safely With pt-table-checksum 2.0. If you aren’t verifying that your replicas have the same data as their masters, you need to, and this is the tool to do it.
Further Reading:
A review of Clojure In Action by Amit Rathore
Clojure In Action
Clojure In Action, by Amit Rathore. Manning, 2012. About 400 pages.This is a lucid and interesting introduction to Clojure and the LISP family of programming languages. It’s been years since I programmed in LISP and I found myself recalling those days, at the same time as I learned a lot more than I used to know. Indeed, I realized that my knowledge of LISP was only superficial, and that I probably ought to take some time at some point and learn it deeply enough to have the epiphany people talk about. (Can I plead that I’ve had the epiphany with SQL? No? How about XSLT — it’s basically LISP in XML? Drat.)
The book is in two parts. First you learn about the fundamentals of Clojure, how it works on the JVM, state and concurrency, and so forth. In the second part there’s a lot of deeper and more specific topics. You could say that the first part is about learning the language and environment, and the second part is about how to really put Clojure, um, into action. Conventions, idioms, and the like make their appearance in both parts, but in the second part there’s a lot of specific topics like building web applications with Clojure and creating DSLs.
I haven’t read the whole book. I saved parts of it for later. Perhaps that’s a shame, but perhaps I’m just not ready for them yet, either. In any case I found the parts that I read to be well worth my time.
Further Reading:
- A review of Pentaho Solutions by Roland Bouman and Jos van Dongen
- A review of MongoDB, the Definitive Guide by Chodorow and Dirolf
- A review of SQL and Relational Theory by C. J. Date
- A review of Optimizing Oracle Performance by Cary Millsap
- Book Review: Building powerful and robust websites with Drupal 6
Knowing when it’s lose-lose
I’ve been a consultant for over five years now and I’m just beginning to learn some basic things. One of them is how to spot a lose-lose customer relationship. I say I’m just beginning because I continue to find new lose-lose scenarios that I don’t see coming.
If things are set up such that nobody can succeed, it’s best to walk away from the deal. With 1400+ customers now, you might think we haven’t walked away from many deals, but that’s not true. We are very careful to choose our customers. They don’t just choose us, we have to say yes in return. I used to be the sole sales person and even in the hungry early days I politely declined or referred a lot of people elsewhere.
Some interesting rules got written on the wiki, such as never do business with someone who confesses to being drunk. This applies also to people who’ve been up all weekend desperately changing things at random, drinking Red Bull, and can’t even remember what the original problem was or stop their minds spinning long enough to describe it clearly. I cut my losses and backed out of one such case, after essentially solving it and not billing for it, because I could see that I’d already lost, and was going to lose more and more badly the longer I stayed engaged. That person still posts online occasionally about what terrible service he got. Public criticism is another no-win situation: you can never take troll bait and come out ahead. (I’ve let myself be goaded into responding a couple of times. It’s hard not to do.)
Another of my rules is that anyone who insults one of your consultants is trouble. I remember someone who thought Vadim’s Ukrainian accent was another nationality I won’t mention, and demanded to talk to a senior person, not realizing who Vadim was at all. It would have been funny if it wasn’t so bigoted.
I remember a disputed invoice from someone who claimed that he couldn’t understand the consultant, who was European (and had successfully completed the work). The customer had a strong rural Southern accent, so I dropped back into just a little bit of my own native Southern accent to imitate him, and said “I know, it can be hard to understand a [nationality] accent.” The man immediately warmed up to me and confessed that “I’d, you know, I’d hoped someone from America would do the work, because, you know, well, I support our troops, you know what I mean?” Racism is racism, y’all.
Then there is the customer who hires you because he (or she) can’t deliver the results he’s supposed to for his business, but has insisted internally that his approach is the only right one. He’s painted himself into a corner, with no way to save face, and now he’s looking to make you the scapegoat. If you give him the advice he actually needs, he’ll vilify you for not solving the problem he asked you to solve, because you’re making him look bad. If you figure out that this is happening, and agree with him, the project will fail and he’ll blame you then too. All he wants to do is get you to demonstrate that it’s not his fault he can’t satisfy his boss’s impossible demands.
Still another scenario is when you end up working with people who didn’t hire you and who don’t buy into your consulting at all. They may resent the boss bringing in the outside help because it implies that they’re incompetent, for example. Or you may have the support of the technical staff, but not buy-in from the upper management. I remember one instance when the administrative staff simply kept changing the access credentials. By the time we got access again, all of the systems we’d been working on were gone and replaced by new systems, and the landscape was completely different, but nobody would tell us what the new situation was because they were too busy doing their “real” jobs, which didn’t include babysitting the consultants.
There are probably dozens of other war stories about situations that are set up to fail from the start. Some of them, in hindsight, could have been managed differently and ended up as customer love stories. But a consultant who can handle an extraordinarily difficult technical problem combined with a delicate interpersonal situation is a rare one indeed, and it’s hard to predict whether even the most skillful consultant can rescue a difficult situation.
For example, it’s sometimes possible, with great interpersonal skills, to win the trust of the technical staff in situations such as the “revolving door VPN access” customer. If you get a toe in the door, you may be able to demonstrate that you are helping their careers without taking any credit for it yourself, for example. The book The Trusted Advisor talks a lot about this type of personal relationship building. But this is a touch-and-go situation at best, and only the most talented and observant consultants in the best of circumstances are likely to succeed.
In the end, consulting is a personal relationship. Businesses don’t have relationships; people do. And you have to choose your business relationship partners carefully, just as you choose your friends and romantic partner. Developing the ability to spot a person or situation that’s stacking the deck against you can be a valuable asset. It’s much better never to enter a poisoned relationship than to have to leave one.
Further Reading:
Progress on High Performance MySQL, 4th Edition
With the 3rd edition of High Performance MySQL finally complete, I’ve begun work on the 4th edition. As you know, technology moves much faster than printing presses, and a book is outdated very quickly, so this is a never-ending project. I’m also outlining the 5th edition in anticipation of starting it immediately afterwards.
I’m looking for your input on what I should cover in the new edition. Should I discuss MySQL’s intra-query parallelization across multiple CPU cores? Should I explore how cloud computing platforms enable higher performance at lower cost than dedicated hardware? Should I explain the bizarre bug in MySQL’s datetime type that causes it to skip the day after March 31st every year? What are your suggestions?
Further Reading:
High Performance MySQL 3rd Edition is real!
O’Reilly authors get 10 copies of their own books for free, and my copies of the third edition of High Performance MySQL arrived yesterday. Now it’s official! It feels nice to actually hold it in my hand.
A few people have asked me about messages from Amazon saying that their ship date has changed. I don’t know anything about that; maybe Amazon just made a wild guess the first time and now they actually know something more realistic. Or maybe the book is more popular than expected? It’s currently at position #10 in the SQL category on Amazon, which seems pretty good to me. No “DaVinci Code” to be sure, but not bad for a technical book.
Further Reading:
Shenandoah Ruby Users Group Monthly Meetup
I’ll be speaking at the Shenandoah Ruby Users Group Monthly Meetup tomorrow in Harrisonburg, Virginia. The topic is “Seven Things To Know About MySQL Performance.” See you there!
Further Reading:
Speaking at RubyNation 2012
Wow, I completely forgot to advertise this. I’m speaking Saturday (tomorrow) at RubyNation, which is already well underway (I’m missing the first day, though). My topic is “Seven Things To Know About MySQL Performance.”
Further Reading:
Why You Need to Understand Your Working Set Size
I guest-posted on Fusion-io’s blog about the database’s working set size and the interplay with fast Flash storage. It’s written from a MySQL point of view, but it’s applicable to many types of systems.
Further Reading:
MySQL monitoring meetup tonight!
Here’s your spammy day-of reminder about tonight’s free MySQL meetup. The topic is MySQL Monitoring Bonanza. There will be beer, pizza, pretty charts and graphs, and friends. Free as in beer (and pizza, did I mention that?)!
Further Reading:
Free webinar on monitoring MySQL
If you follow what Percona is up to, you might have noticed that I’ve created a set of high-quality monitoring and graphing plugins for MySQL and related systems. Currently they support Nagios and Cacti. I’ll give a free webinar on March 28th discussing these, and more broadly, discussing how to monitor MySQL successfully to avoid common problems like spammy alerts about nonexistent problems.
Further Reading:
Room discounts about to expire for MySQL conference!
My earlier blog post was wrong, our block of rooms for the conference is still available past the 12th of March. However, there are apparently zero rooms available in the Hyatt other than our reserved block, and our block discount will expire soon (I’m a little hazy on the exact details, because I thought it was the 12th). What I have been told in no uncertain terms is this: “If you know anyone who plans to attend the conference and needs a hotel room, they should book IMMEDIATELY in our room block or they will not be able to get a room in the Hyatt either at a reasonable rate or at all.“
Please don’t ask me to help — I’m not involved in this, just trying to get the information out there. The logistics are beyond my ken.
Update: apparently the room discount rate ends on the 19th when our reservation on the block of rooms is released or something — again, beyond my understanding. I am a Bear of Very Small Brain…
Further Reading:



