Monthly Archives: June 2009

PHPHOST BLOG

Web Hosting Related Articles You May Need

All Abstractions Are Failed Abstractions

In programming, abstractions are powerful things:

Joel Spolsky has an article in which he states

All non-trivial abstractions, to some degree, are leaky.

This is overly dogmatic – for example, bignum classes are exactly the same regardless of the native integer multiplication. Ignoring that, this statement is essentially true, but rather inane and missing the point. Without abstractions, all our code would be completely interdependent and unmaintainable, and abstractions do a remarkable job of cleaning that up. It is a testament to the power of abstraction and how much we take it for granted that such a statement can be made at all, as if we always expected to be able to write large pieces of software in a maintainable manner.

But they can cause problems of their own. Let’s consider a particular LINQ to SQL query, designed to retrieve the most recent 48 Stack Overflow questions.

var posts = 
  (from p in DB.Posts
  where 
  p.PostTypeId == PostTypeId.Question &&
  p.DeletionDate == null &&
  p.Score >= minscore
  orderby p.LastActivityDate descending
  select p).
  Take(maxposts);

The big hook here is that this is code the compiler actually understands. You get code completion, compiler errors if you rename a database field or mistype the syntax, and so forth. Perhaps best of all, you get an honest to goodness post object as output! So you can turn around and immediately do stuff like this:

foreach (var post in posts.ToList())
{
    Render(post.Body);
}

Pretty cool, right?

Well, that Linq to SQL query is functionally equivalent to this old-school SQL blob. More than functionally, it is literally identical, if you examine the SQL string that LINQ generates behind the scenes:

string query = 
  "select top 48 * from Posts
  where 
  PostTypeId = 1 and 
  DeletionDate is null and 
  Score >= -4
  order by LastActivityDate desc";

This text blob is of course totally opaque to the compiler. Fat-finger a syntax error in here, and you won’t find out about it until runtime. Even if it does run without a runtime error, processing the output of the query is awkward. It takes row level references and a lot of tedious data conversion to get at the underlying data.

var posts = DB.ExecuteQuery(query);

foreach (var post in posts.ToList());
{
   Render(post["Body"].ToString());
}

So, LINQ to SQL is an abstraction — we’re abstracting away raw SQL and database access in favor of native language constructs and objects. I’d argue that Linq to SQL is a good abstraction. Heck, it’s exactly what I asked for five years ago.

But even a good abstraction can break down in unexpected ways.

Consider this optimization, which is trivial in the old-school SQL blob code: instead of pulling down every single field in the post records, why not pull just the id number? Makes sense, if that’s all I need. And it’s faster — much faster!

select top 48 * from Posts

827 ms
select top 48 Id from Posts

260 ms

Selecting all columns with the star (*) operator is expensive, and that’s what LINQ to SQL always does by default. Yes, you can specify lazy loading, but not on a per-query basis. Normally, this is a non-issue, because selecting all columns for simple queries is not all that expensive. And you’d think pulling down 48 measly little post records would be squarely in the “not expensive” category!

So let’s compare apples to apples. What if we got just the id numbers, then retrieved the full data for each row?

select top 48 Id from Posts

260 ms
select * from Posts where Id = 12345

3 ms

Now, retrieving 48 individual records one by one is sort of silly, becase you could easily construct a single where Id in (1,2,3..,47,48) query that would grab all 48 posts in one go. But even if we did it in this naive way, the total execution time is still a very reasonable (48 * 3 ms) + 260 ms = 404 ms. That is half the time of the standard select-star SQL emitted by LINQ to SQL!

An extra 400 milliseconds doesn’t sound like much, but slow pages lose users. And why in the world would you perform a slow database query on every single page of your website when you don’t have to?

It’s tempting to blame Linq, but is Linq really at fault here? These seem like identical database operations to me:

1. Give me all columns of data for the top 48 posts.

or

1. Give me just the ids for the top 48 posts.
2. Retrieve all columns of data for each of those 48 ids.

So why in the wide, wide world of sports would one of these seemingly identical operations be twice as slow as the other?

The problem isn’t Linq to SQL. The problem is that we’re attempting to spackle a nice, clean abstraction over a database that is full of highly irregular and unusual real world behaviors. Databases that:

  • may not have the right indexes
  • may misinterpret your query and generate an inefficient query plan
  • are trying to perform an operation that doesn’t fit well in available memory
  • are paging data from disks which might be busy at that particular moment
  • might contain irregularly sized column datatypes

That’s what’s so frustrating. We can’t just pretend all our data is formatted into neat, orderly data structures sitting there in memory, lined up in convenient little queues for us to reach out and casually scoop them up. As I’ve demonstrated, even trivial queries can have bizarre behavior and performance characteristics that are not at all clear.

To its credit, Linq to SQL is quite flexible: we can use strongly typed queries, or we can use SQL blob queries that we cast to the right object type. That flexibility is critical, because so much of our performance depends on these quirks of the database. We default to the built-in Linq language constructs, and drop down to hand-tuning ye olde SQL blobs where the performance traces tell us we need to.

Either way, it’s clear that you’ve got to know what’s happening in the database every step of the way to even begin understanding the performance of your application, much less troubleshoot it.

I think you could make a fairly solid case that Linq to SQL is, in fact, a leaky and failed abstraction. Exactly the kind of thing Joel was complaining about. But I’d also argue that virtually all good programming abstractions are failed abstractions. I don’t think I’ve ever used one that didn’t leak like a sieve. But I think that’s an awfully architecture astronaut way of looking at things. Instead, let’s ask ourselves a more pragmatic question:

Does this abstraction make our code at least a little easier to write? To understand? To troubleshoot? Are we better off with this abstraction than we were without it?

It’s our job as modern programmers not to abandon abstractions due to these deficiencies, but to embrace the useful elements of them, to adapt the working parts and construct ever so slightly less leaky and broken abstractions over time. Like desperate citizens manning a dike in a category 5 storm, we programmers keep piling up these leaky abstractions, shoring up as best we can, desperately attempting to stay ahead of the endlessly rising waters of complexity.

As much as I may curse Linq to SQL as yet another failed abstraction, I’ll continue to use it. Yes, I may end up soggy and irritable at times. But it sure as heck beats drowning.

[advertisement] Interested in agile? See how a world-leading software vendor is practicing agile.

Continue reading

Posted in Syndicated, Uncategorized | Comments Off on All Abstractions Are Failed Abstractions

Writing Code Is Much Like Writing Prose

There are many similarities when comparing the writing of code to the writing of prose. Because of this, we should be able to learn from doing each of these and apply things learned from one to the other.The AbsolutesIn writing code and in writing pro… Continue reading

Posted in General Development, Syndicated | Comments Off on Writing Code Is Much Like Writing Prose

Email Finder: People Research Made Easy

If you have ever wondered about a person from your past, whether it be a distant relative or ex-girlfriend or boyfriend you can’t take your mind off of, then an Email Finder service might be just what you have been looking for. Imagine starting off with just a first and last name of […] Continue reading

Posted in Personal, Syndicated | Comments Off on Email Finder: People Research Made Easy

Java Enums Are Inherently Serializable

More than once, I have seen code such as the following (without the comments I have added to point out flaws), in which a well-intentioned Java developer has ensured that their favorite Enum explicitly declares that it is Serializable and has even prov… Continue reading

Posted in Java (General), Syndicated | Comments Off on Java Enums Are Inherently Serializable

Viewing Names Bound to RMI Registry

When working with Java Remote Method Invocation (RMI), there are times when it is helpful to know which names are currently bound to a particular rmiregistry on a particular host/port combination. This is especially true when debugging problems relate… Continue reading

Posted in GlassFish, Groovy, Java (General), Syndicated | Comments Off on Viewing Names Bound to RMI Registry

The iPhone Software Revolution

The original iPhone was for suckers hard-core gadget enthusiasts only. But as I predicted, 12 months later, the iPhone 3G rectified all the shortcomings of the first version. And now, with the iPhone 3GS, we’ve reached the mythical third version:

A computer industry adage is that Microsoft does not make a successful product until version 3. Its Windows operating system was not a big success until the third version was introduced in 1990 and, similarly, its Internet Explorer browsing software was lackluster until the third version.

The platform is now so compelling and polished that even I took the plunge. For context, this is the first Apple product I’ve owned since 1984. Literally.

I am largely ambivalent towards Apple, but it’s impossible to be ambivalent about the iPhone — and in particular, the latest and greatest iPhone 3GS. It is the Pentium to the 486 of the iPhone 3G. A landmark, genre-defining product, no longer a mere smartphone but an honest to God fully capable, no-compromises computer in the palm of your hand.

Here’s how far I am willing to go: I believe the iPhone will ultimately be judged a more important product than the original Apple Macintosh.

iphone3gs1.jpg

Yes, I am dead serious. Just check back here in fifteen to twenty years to see if I was right. (Hint: I will be.)

There’s always been a weird tension in Apple’s computer designs, because they attempt to control every nuance of the entire experience from end to end. For the best Appletm experience, you run custom Appletm applications on artfully designed Appletm hardware dongles. That’s fundamentally at odds with the classic hacker mentality that birthed the general purpose computer. You can see it in the wild west, anything goes Linux ecosystem. You can even see it in the Wintel axis of evil, where a million motley mixtures of hardware, software, and operating system variants are allowed to bloom, like little beige stickered flowers, for a price.

But a cell phone? It’s a closed ecosystem, by definition, running on a proprietary network. By a status quo of incompetent megacorporations who wouldn’t know user friendliness or good design if it ran up behind them and bit them in the rear end of their expensive, tailored suits. All those things that bugged me about Apple’s computers are utter non-issues in the phone market. Proprietary handset? So is every other handset. Locked in to a single vendor? Everyone signs a multi-year contract. One company controlling your entire experience? That’s how it’s always been done. Nokia, Sony/Ericsson, Microsoft, RIM — these guys clearly had no idea what they were in for when Apple set their sights on the cell phone market — a market that is a nearly perfect match to Apple’s strengths.

Apple was born to make a kick-ass phone. And with the lead they have, I predict they will dominate the market for years to come.

Consider all the myriad devices that the iPhone 3GS can sub for, and in some cases, outright replace:

  • GPS
  • Netbook (for casual web browsing and email)
  • Gameboy
  • Watch
  • Camera
  • MP4 Video Recorder
  • MP3 player
  • DVD player
  • eBook reader

Oh yeah, and I heard you can make phone calls with it, too. Like any general purpose computer, it’s a jack of all trades.

As impressive as the new hardware is, the software story is even bigger. If you’re a software developer, the iPhone can become a career changing device, all thanks to one little teeny-tiny icon on the iPhone home screen:

app_store.jpg

The App Store makes it brainlessly easy to install, upgrade, and purchase new applications. But more importantly, any software developer — at the mild entry cost of owning a Mac, and signing up for the $99 iPhone Developer Program — can build an app and sell it to the worldwide audience of iPhone users. Apple makes this stuff look easy, when historically it has been anything but. How many successful garage developers do you know for Nintendo DS? For the Motorola Razr? For Palm? For Windows Mobile?

Apple has never been particularly great at supporting software developers, but I have to give them their due: with the iPhone developer program, they’ve changed the game. Nowhere is this more evident than in software pricing. I went on a software buying spree when I picked up my iPhone 3GS, ending up with almost three pages of new applications from the App Store. I was a little worried that I might rack up a substantial bill, but how can I resist when cool stuff like ports of the classic Amiga Pinball Dreams are available, or the historic Guru Meditation? The list of useful (and useless) apps is almost endless, and growing every day.

My total bill for 3 screens worth of great iPhone software applications? About fifty bucks. I’ve paid more than that for Xbox 360 games I ended up playing for a total of maybe three hours! About half of the apps were free, and the rest were a few bucks. I think the most I paid was $9.99, and that was for an entire library. What’s revolutionary here isn’t just the development ecosystem, but the economics that support it, too. At these crazy low prices, why not fill your phone with cool and useful apps? You might wonder if developers can really make a living selling apps that only cost 99 cents. Sure you can, if you sell hundreds of thousands of copies:

Freeverse, one of the leading developers and publishers of iPhone games, sold the millionth copy of its Flick Fishing game over the weekend, making Flick Fishing the first paid application to reach the one million download milestone. Flick Fishing, which costs 99 cents, allows iPhone and iPod touch users to take a virtual fishing trip with the flick of a wrist. The game uses the iPhone’s accelerometer to recreate a casting motion, then a combination of bait choice and fishing skill helps players land the big fish.

Preliminary weekly reports for the period from 23 March to 19 April indicate that Flight Control sold a total of 587,485 units during this time. We estimate total sales are now over 700,000 units, with the bulk of sales occurring in a 3 week period.
Flight Control

That’s an honorable way to get rich programming, and a nice business alternative to the dog-eat-dog world of advertising subsidized apps.

I love nothing more than supporting my fellow software developers by voting with my wallet. it does my heart good to see so many indie and garage developers making it big on the iPhone. (Also, I’m a sucker for physics games, and there are a bunch of great ones available in the App Store). I’m more than happy to pitch in a few bucks every month for a great new iPhone app.

If this has all come across as too rah-rah, too uncritical a view of the iPhone, I apologize. There are certainly things to be critical about, such as the App Store’s weird enforcement policies, the lack of support for emulators, or Flash, or anything else that might somehow undermine the platform as decided in some paranoid, secretive Apple back room. Not that we’d ever hear about it.

I didn’t write this to kiss Apple’s ass. I wrote this because I truly feel that the iPhone is a key inflection point in software development. We will look back on this as the time when “software” stopped being something that geeks buy (or worse, bootleg), and started being something that everyone buys, every day. You’d have to be a jaded developer indeed not to find something magical and transformative in this formula, and although others will clearly follow, the iPhone is leading the way.

“There’s an app for that.” Kudos, Apple. From the bottom of my hoary old software developer heart.

[advertisement] Interested in agile? See how a world-leading software vendor is practicing agile.

Continue reading

Posted in Syndicated, Uncategorized | Comments Off on The iPhone Software Revolution

Thread Analysis with VisualVM

Although jstack (Java Stack Trace) is a useful tool for learning more about a how a Java thread is behaving, VisualVM is an even easier method for obtaining the same type of information.It is easy to run jstack as demonstrated in the next screen snapsh… Continue reading

Posted in Java (General), Syndicated, VisualVM | Comments Off on Thread Analysis with VisualVM

Heap Dump and Analysis with VisualVM

In previous blog posts, I have covered using VisualVM to acquire HotSpot JVM runtime information in a manner similar to jinfo and how to use VisualVM in conjunction with JMX and MBeans in a manner similar to JConsole. This blog posting looks at how Vi… Continue reading

Posted in Java (General), Syndicated, VisualVM | Comments Off on Heap Dump and Analysis with VisualVM

Scaling Up vs. Scaling Out: Hidden Costs

In My Scaling Hero, I described the amazing scaling story of plentyoffish.com. It’s impressive by any measure, but also particularly relevant to us because we’re on the Microsoft stack, too. I was intrigued when Markus posted this recent update:

Last monday we upgraded our core database server after a power outage knocked the site offline. I haven’t touched this machine since 2005 so it was a major undertaking to do it last minute. We upgraded from a machine with 64 GB of ram and 8 CPUs to a HP ProLiant DL785 with 512 GB of ram and 32 CPUs

The HP ProLiant DL785 G5 starts at $16,999 — and that’s barebones, with nothing inside. Fully configured, as Markus describes, it’s kind of a monster:

  • 7U size (a typical server is 2U, and mainstream servers are often 1U)
  • 8 CPU sockets
  • 64 memory sockets
  • 16 drive bays
  • 11 expansion slots
  • 6 power supplies

It’s unclear if they bought it pre-configured, or added the disks, CPUs, and memory themselves. The most expensive configuration shown on the HP website is $37,398 and that includes only 4 processors, no drives, and a paltry 32 GB memory. When topped out with ultra-expensive 8 GB memory DIMMs, 8 high end Opterons, 10,000 RPM hard drives, and everything else — by my estimates, it probably cost closer to $100,000. That might even be a lowball number, considering that the DL785 submitted to the TPC benchmark website (pdf) had a “system cost” of $186,700. And that machine only had 256 GB of RAM. (But, to be fair, that total included another major storage array, and a bunch of software.)

At any rate, let’s assume $100,000 is a reasonable ballpark for the monster server Markus purchased. It is the very definition of scaling up — a seriously big iron single server.

But what if you scaled out, instead — Hadoop or MapReduce style, across lots and lots of inexpensive servers? After some initial configuration bumps, I’ve been happy with the inexpensive Lenovo ThinkServer RS110 servers we use. They’re no match for that DL785 — but they aren’t exactly chopped liver, either:

Lenovo ThinkServer RS110 barebones $600
8 GB RAM $100
2 x eBay drive brackets $50
2 x 500 GB SATA hard drives, mirrored $100
Intel Xeon X3360 2.83 GHz quad-core CPU $300

Grand total of $1,150 per server. Plus another 10 percent for tax, shipping, and so forth. I replace the bundled CPU and memory that the server ships with, and then resell the salvaged parts on eBay for about $100 — so let’s call the total price per server $1,200.

Now, assuming a fixed spend of $100,000, we could build 83 of those 1U servers. Let’s compare what we end up with for our money:

 

Scaling Up

Scaling Out
CPUs

32

332
RAM

512 GB

664 GB
Disk

4 TB

40.5 TB

Now which approach makes more sense?

(These numbers are a bit skewed because that DL785 is at the absolute extreme end of the big iron spectrum. You pay a hefty premium for fully maxxing out. It is possible to build a slightly less powerful server with far better bang for the buck.)

But there’s something else to consider: software licensing.

 

Scaling Up

Scaling Out
OS

$2,310

$33,200*
SQL

$8,318

$49,800*

(If you’re using all open source software, then of course these costs will be very close to zero. We’re assuming a Microsoft shop here, with the necessary licenses for Windows Server 2008 and SQL Server 2008.)

Now which approach makes more sense?

What about the power costs? Electricity and rack space isn’t free.

 

Scaling Up

Scaling Out
Peak Watts

1,200w

16,600w
Power Cost / Year

$1,577

$21,815

Now which approach makes more sense?

I’m not picking favorites. This is presented as food for thought. There are at least a dozen other factors you’d want to consider depending on the particulars of your situation. Scaling up and scaling out are both viable solutions, depending on what problem you’re trying to solve, and what resources (financial, software, and otherwise) you have at hand.

That said, I think it’s fair to conclude that scaling out is only frictionless when you use open source software. Otherwise, you’re in a bit of a conundrum: scaling up means paying less for licenses and a lot more for hardware, while scaling out means paying less for the hardware, and a whole lot more for licenses.

* I have no idea if these are the right prices for Windows Server 2008 and SQL Server 2008, because reading about the licensing models makes my brain hurt. If anything, it could be substantially more.

[advertisement] Interested in agile? See how a world-leading software vendor is practicing agile.

Continue reading

Posted in Syndicated, Uncategorized | Comments Off on Scaling Up vs. Scaling Out: Hidden Costs

JMX 2 Postponed Until Java SE 8

It was disappointing, but not altogether surprising, to learn that JMX 2 will not be part of Java SE 7. Anyone who saw my Colorado Software Summit 2008 presentation JMX Circa 2008 is aware of how excited I was for some of the new features that were te… Continue reading

Posted in Java SE 8, JMX, Syndicated | Comments Off on JMX 2 Postponed Until Java SE 8