Archive for June, 2009

30 Jun

All Abstractions Are Failed Abstractions

comments

In programming, abstractions are powerful things:

Joel Spolsky has an article in which he states

All non-trivial abstractions, to some degree, are leaky.

This is overly dogmatic – for example, bignum classes are exactly the same regardless of the native integer multiplication. Ignoring that, this statement is essentially true, but rather inane and missing the point. Without abstractions, all our code would be completely interdependent and unmaintainable, and abstractions do a remarkable job of cleaning that up. It is a testament to the power of abstraction and how much we take it for granted that such a statement can be made at all, as if we always expected to be able to write large pieces of software in a maintainable manner.

But they can cause problems of their own. Let’s consider a particular LINQ to SQL query, designed to retrieve the most recent 48 Stack Overflow questions.

var posts =
  (from p in DB.Posts
  where
  p.PostTypeId == PostTypeId.Question &&
  p.DeletionDate == null &&
  p.Score >= minscore
  orderby p.LastActivityDate descending
  select p).
  Take(maxposts);

The big hook here is that this is code the compiler actually understands. You get code completion, compiler errors if you rename a database field or mistype the syntax, and so forth. Perhaps best of all, you get an honest to goodness post object as output! So you can turn around and immediately do stuff like this:

foreach (var post in posts.ToList())
{
    Render(post.Body);
}

Pretty cool, right?

Well, that Linq to SQL query is functionally equivalent to this old-school SQL blob. More than functionally, it is literally identical, if you examine the SQL string that LINQ generates behind the scenes:

string query =
  "select top 48 * from Posts
  where
  PostTypeId = 1 and
  DeletionDate is null and
  Score >= -4
  order by LastActivityDate desc";

This text blob is of course totally opaque to the compiler. Fat-finger a syntax error in here, and you won’t find out about it until runtime. Even if it does run without a runtime error, processing the output of the query is awkward. It takes row level references and a lot of tedious data conversion to get at the underlying data.

var posts = DB.ExecuteQuery(query);

foreach (var post in posts.ToList());
{
   Render(post["Body"].ToString());
}

So, LINQ to SQL is an abstraction — we’re abstracting away raw SQL and database access in favor of native language constructs and objects. I’d argue that Linq to SQL is a good abstraction. Heck, it’s exactly what I asked for five years ago.

But even a good abstraction can break down in unexpected ways.

Consider this optimization, which is trivial in the old-school SQL blob code: instead of pulling down every single field in the post records, why not pull just the id number? Makes sense, if that’s all I need. And it’s faster — much faster!

select top 48 * from Posts 827 ms
select top 48 Id from Posts 260 ms

Selecting all columns with the star (*) operator is expensive, and that’s what LINQ to SQL always does by default. Yes, you can specify lazy loading, but not on a per-query basis. Normally, this is a non-issue, because selecting all columns for simple queries is not all that expensive. And you’d think pulling down 48 measly little post records would be squarely in the “not expensive” category!

So let’s compare apples to apples. What if we got just the id numbers, then retrieved the full data for each row?

select top 48 Id from Posts 260 ms
select * from Posts where Id = 12345 3 ms

Now, retrieving 48 individual records one by one is sort of silly, becase you could easily construct a single where Id in (1,2,3..,47,48) query that would grab all 48 posts in one go. But even if we did it in this naive way, the total execution time is still a very reasonable (48 * 3 ms) + 260 ms = 404 ms. That is half the time of the standard select-star SQL emitted by LINQ to SQL!

An extra 400 milliseconds doesn’t sound like much, but slow pages lose users. And why in the world would you perform a slow database query on every single page of your website when you don’t have to?

It’s tempting to blame Linq, but is Linq really at fault here? These seem like identical database operations to me:

1. Give me all columns of data for the top 48 posts.

or

1. Give me just the ids for the top 48 posts.

2. Retrieve all columns of data for each of those 48 ids.

So why in the wide, wide world of sports would one of these seemingly identical operations be twice as slow as the other?

The problem isn’t Linq to SQL. The problem is that we’re attempting to spackle a nice, clean abstraction over a database that is full of highly irregular and unusual real world behaviors. Databases that:

  • may not have the right indexes
  • may misinterpret your query and generate an inefficient query plan
  • are trying to perform an operation that doesn’t fit well in available memory
  • are paging data from disks which might be busy at that particular moment
  • might contain irregularly sized column datatypes

That’s what’s so frustrating. We can’t just pretend all our data is formatted into neat, orderly data structures sitting there in memory, lined up in convenient little queues for us to reach out and casually scoop them up. As I’ve demonstrated, even trivial queries can have bizarre behavior and performance characteristics that are not at all clear.

To its credit, Linq to SQL is quite flexible: we can use strongly typed queries, or we can use SQL blob queries that we cast to the right object type. That flexibility is critical, because so much of our performance depends on these quirks of the database. We default to the built-in Linq language constructs, and drop down to hand-tuning ye olde SQL blobs where the performance traces tell us we need to.

Either way, it’s clear that you’ve got to know what’s happening in the database every step of the way to even begin understanding the performance of your application, much less troubleshoot it.

I think you could make a fairly solid case that Linq to SQL is, in fact, a leaky and failed abstraction. Exactly the kind of thing Joel was complaining about. But I’d also argue that virtually all good programming abstractions are failed abstractions. I don’t think I’ve ever used one that didn’t leak like a sieve. But I think that’s an awfully architecture astronaut way of looking at things. Instead, let’s ask ourselves a more pragmatic question:

Does this abstraction make our code at least a little easier to write? To understand? To troubleshoot? Are we better off with this abstraction than we were without it?

It’s our job as modern programmers not to abandon abstractions due to these deficiencies, but to embrace the useful elements of them, to adapt the working parts and construct ever so slightly less leaky and broken abstractions over time. Like desperate citizens manning a dike in a category 5 storm, we programmers keep piling up these leaky abstractions, shoring up as best we can, desperately attempting to stay ahead of the endlessly rising waters of complexity.

As much as I may curse Linq to SQL as yet another failed abstraction, I’ll continue to use it. Yes, I may end up soggy and irritable at times. But it sure as heck beats drowning.

[advertisement] Interested in agile? See how a world-leading software vendor is practicing agile.

From Coding Horror

29 Jun

Writing Code Is Much Like Writing Prose

comments

There are many similarities when comparing the writing of code to the writing of prose. Because of this, we should be able to learn from doing each of these and apply things learned from one to the other.

The Absolutes

In writing code and in writing prose, there are a few things that are either absolute or approach very nearly the status of absolute. This is especially true for programming languages where syntactic and semantic rules must be followed for the code to be compiled and/or interpreted correctly. Even when a programming language supports certain generally frowned-upon features, some of these features are avoided to such a large degree that they almost appear absolute. For example, direct use of “goto” is generally frowned upon and is rarely seen in most code bases. However, less obvious versions of this (such as break and continue in Java) do seem to be less strictly avoided.

Although it is less strictly enforced in writing prose, there still is significant pressure to conform to certain absolutes even in writing prose. For example, it is generally assumed that most professional prose will include sentences that begin with capital letters and end with periods. Similarly, proper names are almost always capitalized and correct spelling and reasonable grammar are also expected. The degree of enforcement for such things in prose often depends on the media. Professional papers and articles typically are the most enforced with the author and professional editors investing significant effort into polishing the prose. On the opposite end of the spectrum are e-mail messages, blogs, and Twitter messages, which seem to have less enforced absolutes.

The Frowned-Upon

Although there are a small number of absolutes or near-absolutes as just discussed, code development and prose writing seem to have many more things that are not absolutely avoided but seem to be strongly discouraged. However, these things tend to creep in despite their negative reputations because they do offer some advantages. Usually the advantages these items offer are ease for the writer at the expense of later reader or maintainer having more difficult prose or code.

Many developers realize problems associated with using so-called “magic numbers.” However, they still seem to crop up. Often they are put in place “temporarily” and then forgotten or limited schedules prevent replacing them with a constant. These “magic numbers” are quick and easy to use when doing initial development, but can lead to a maintenance nightmare. Global variables offer a similar trade-off of easy early development at the expense of maintainability, robustness, and scalability.

Prose authoring has similar frowned-upon, but still often used, features. For example, it is often said that sentences should not end with prepositions. Similarly, it is often said that strong, active voice should be used. The tense of the writing should also be consistent. These are all things that are recommended because they do offer recognized benefits, but they are also easy to ignore or cheat on a bit when it is not deemed worth the time or effort to satisfy all of them all of the time.

The Standards

Since nearly the beginning of software development, developers have seemed to want to create and adhere to coding standards and conventions. Of course, we also seem to have been resistant to other peoples’ ideas of standards and conventions for nearly as long. The reason most of us are willing to give up some “creative freedom” and adhere to standards and conventions is that we have learned that code is more readable and maintainable (especially by others) when we adhere to a minimum set of conventions.

Prose can benefit from the same benefits of standardization and convention. There are books such as Elements of Style and Chicago Manual of Style devoted to prose style. Most of the arguments in favor of these prose writing style conventions are the same arguments used in favor of coding conventions: easier to read and maintain and consistency to benefit different readers and authors/developers.

One style issue is very similar between prose writing and code development. The subject of spaces can be surprisingly controversial in both arenas. In software development, most developers seem to agree that the optimal number of spaces for indentation is between 2 and 4 spaces. However, trying to narrow down which of these is best (2 spaces or 3 spaces or 4 spaces) is significantly more difficult.

There has been a controversy in the prose writing world regarding how many spaces should follow a period that ends one sentence before the first letter of the next sentence. I grew up thinking that two spaces was the expected number of spaces between the end of one sentence and the beginning of the next sentence. However, I was recently informed by a prose reviewer and editor that a single space is now preferred. I wondered if this was a web browser-inspired shift, but there seems to be evidence that this shift started even before the widespread adoption of HTML.

Conciseness

There seem to be differences of opinion on whether prose should be concise or verbose. To some extent, this depends on the subject of the prose. I prefer technical prose to be as concise as possible while still remaining thorough. This is especially true of technical references. However, with novels, extra verbosity can sometimes be nice to explain the story and character development. This can even go too far, for my taste, as evidenced by Moby Dick.

I have found that even in software development there is a wide diversity of opinion about conciseness of code. The longer I work in the industry, the more I value conciseness. However, I know many Java developers who don’t like the same degree of conciseness that I like. An example of this is the Java ternary operator. This operator has really grown on me, but I still know many Java developers who do not care for it at all. Although many developers are migrating to programming languages that emphasize and value conciseness, there are limits to how concise we want to be. After all, none of us are probably too excited to write and maintain production code that can be sneakily squeezed into a single line.

Refactoring

“Refactoring” is a popular term in software development, but it does have its equivalent in prose writing. In my case, I find that when I revisit my own articles, I continually “refactor” the text to make it flow better, to reduce unnecessary repetition, and to make it generally more concise. The editors of formal articles often do this to an even greater degree. In fact, the editors’ reviews often remind me of how developers are eager to change others’ peoples’ code to match their own preferences. Some of the “refactorings” I see in both development and in article editing have marginal value. However, I think most of us can agree that some “refactoring” or editing is useful and recommended for writing code or for writing prose.

When the editors and reviewers at Oracle Technology Network recommended cutting my original draft of the Basic JPA Best Practices article to less than half its original draft size, it took some effort to “refactor” that article to that point. Although some minor details and some explanatory text were removed in the process, most of the substance was retained even though the final article had half as many words as my original draft. It was not trivial trimming that draft down without losing too much substantial content, but the effort the reviewers, editors and I invested is reflected in the improvements. That article still weighs in around 11 pages, but it is leaner, tighter, and more optimized than my original draft. That sounds awfully similar to the benefits of code refactoring. The process really was like refactoring because it was more than just removing words; it involved changing words and changing sentence structure and paragraph structure.

I don’t spend much time reviewing my blog posts at the time of their writing. I think this is common among blogs, though some are exceptions. Because of this, most of us expect blog posts to be rougher than formal articles. There are definitely different expectations for different forms of writing. Similarly in code, prototypes and demonstration code can often be a little “rougher around the edges” than highly reviewed and refactored production code.

More Knowledge Means More Expressiveness

When writing prose, one of the most useful techniques for writing concise but thorough prose is to know and carefully use the appropriate words and phrases. Words have different nuances and these nuances can be used to provide more expressiveness with the same number of words. In code, we see the same thing. There is often more than one way to get the job done, but thorough understanding of the language’s features and provided class libraries allows us to select the most appropriate language feature or class that provides the exact nuanced solution appropriate to the problem at hand.

When writing prose, it is common to use well-known idioms and phrases to imply much more than the few words would normally imply. For example, “a picture is worth a thousand words” consists of only seven words but implies much more than what we might say with only seven words. Some assumed knowledge is required (readers must be familiar with the idiom) to make this work. In code, we often use design patterns and other common phrases to succinctly describe much larger ideas that otherwise would require much more description.

The Value of Review

I have found that both code and prose that I write benefit when reviewed by someone else. When I write my own code or prose, I know what I am trying to say and it all makes sense. Reviewers of articles and reviewers of code can ask questions about what is intended and provide feedback that makes the code or article more generally appealing. Sometimes we’re too close to the product for our own good and the reviewer can help us to see things that we don’t see.

“Readable” is in the Eye of the Beholder

To some degree, what is “readable” depends on the person doing the reading. This applies to both prose and code. Readers (whether reading prose or someone’s code) have their own preferences. Just as we all like different prose authors’ writing, it is not surprising that we each find different styles of code easier or more difficult to read. For example, I have an easier time reading code written by people who have similar tastes and preferences to mine. For this reason, I don’t think we’ll ever see a single programming language or framework that everyone uses. There is just too wide of a spectrum of differences of opinion for any one language or framework to appeal to everyone. This is also an important observation to realize when writing code or prose. You can try to appeal to the widest set possible, but no matter what you do there will probably be at least a small group of people who don’t like it.

Conclusion

Writing prose and writing code have much in common. Many of the same techniques that make better prose also make better code. In both cases, knowing what one has to work with (vocabulary and common phrases for prose and language features and class libraries for code) can make it easier to write particular effective prose or code. Both types of writing also benefit tremendously from review. Many of the same controversies surround both types of writing.

From Dustin's Software Development Cogitations and Speculations

29 Jun

Email Finder: People Research Made Easy

comments

If you have ever wondered about a person from your past, whether it be a distant relative or ex-girlfriend or boyfriend you can’t take your mind off of, then an Email Finder service might be just what you have been looking for. Imagine starting off with just a first and last name of the targeted person, and from there being able to find out their e-mail, phone numbers and current address. Those are the claims that will soon be researched for this in depth review of the Email Finder service.

At first glance, this program seems very basic. The simplicity is actually a major attribute of the service, due to the fact it allows anyone to use it without the need for a detailed guide.

It doesn’t get any easier than typing in a first and last name, pressing start and waiting for the results.

The signup process was also a breeze. No need to fill in page after page of information. Under this vale of simplicity lies an intricate system that searches through multiple databases to find your desired search. The search also offers information on the many social networks that your desired search could possibly be using. Myspace, Facebook, and other popular social communities are quickly researched and any matching profile is brought to your attention.

There is also the option to search e-mail addresses. This can be useful for users that constantly use sites such as Craigslist and Ebay. After finding a great deal, it can always be useful to check into the offering party to see if they are legit. Simply search the provided e-mail address through the Email Finder service and soon you will be able to see if the address is valid, whether the e-mail is blacklisted, and the IP address that is associated with it. This process can save you hundreds of dollars from shady dealings.

After using the Email Finder, certain key points seem to come to mind. These should all be taken into account before purchasing.

Pros:

• Quick and easy sign up that enables search within minutes.
• Thorough information provided on the desired search object.
• Offers information for online service providers in case of shady dealings.
• Ability to opt out your own information if you require it.

Cons:

• Although this should be expected, not every search ends with your desired information. In this situation, the staff does what it can to help, and offers a refund if the search ends with no results.

When all the information in combined, this affordable service has an impressive resume for potential customers to take into account. At $1.95 a month with a money back guarantee, there is no reason not to try it. So use Email Finder to do the research for you!

From Geek Daily

26 Jun

Java Enums Are Inherently Serializable

comments

More than once, I have seen code such as the following (without the comments I have added to point out flaws), in which a well-intentioned Java developer has ensured that their favorite Enum explicitly declares that it is Serializable and has even provided a serialVersionUID for it.

import java.io.Serializable;

/** * Enum example with unnecessary and ignored serialization specification * details.  The Enum is already Serializable and attempts to control its * serialization behavior are ignored.  See Section 1.12 ("Serialization of Enum * Constants") of the "Java Object Serialization Specification Version 6.0". */public enum StateEnum implements Serializable{   ALABAMA("Alabama", "AL"),   CALIFORNIA("California", "CA"),   COLORADO("Colorado", "CO"),   IDAHO("Idaho", "ID"),   UTAH("Utah", "UT"),   WYOMING("Wyoming", "WY");

   // Don't do this: Don't specify serialVersionUID for enums and don't use   // an arbitrary constant such as 42L for all versions; use serialver on Sun JDK   private static final long serialVersionUID = 42L; 

   private String stateName;   private String stateAbbreviation;

   StateEnum(final String newStateName, final String newStateAbbreviation)   {      this.stateName = newStateName;      this.stateAbbreviation = newStateAbbreviation;   }}

Because enums are automatically Serializable (see Javadoc API documentation for Enum), there is no need to explicitly add the “implements Serializable” clause following the enum declaration. Once this is removed, the import statement for the java.io.Serializable interface can also be removed. If you have any doubts about Enum being Serializable, run the HotSpot-provided serialver tool against your favorite enum that does not declare itself Serializable. The tool will return 0L for all enums. When a class is not Serializable, this tool returns the message “Class –yourClassNameHere– is not Serializable.” An example of this is shown in the next screen snapshot.

The fact that serialver returns 0L for the enum’s serialVersionUID indicates that the enum is indeed Serializable. The Javadoc also indicates this. A third way to prove this to yourself is to use instanceof operator as shown in the next code sample.

import java.io.Serializable;

public class UsesStateEnum{   private StateEnum state;

   public UsesStateEnum(final StateEnum newState)   {      this.state = newState;   }

   public StateEnum getState()   {      return this.state;   }

   public void verifyEnumIsSerializable()   {      System.out.print("StateEnum instance of Serializable? ");      System.out.println(this.state instanceof Serializable ? "yes" : "no");   }

   public static void main(final String[] arguments)   {      System.out.println("Verify Enum is Serializable");      final UsesStateEnum me = new UsesStateEnum(StateEnum.COLORADO);      me.verifyEnumIsSerializable();   }}

As mentioned above, all Enums have a serialVersionUID of 0L. Therefore, it is not necessary to specify one as is shown in the code above. In fact, when one is specified, it is ignored anyway. The example above intentionally used the hard-coded 42L used in Joshua Bloch’s Effective Java example of how not to create a serialVersionUID. As the screen snapshot below indicates, this explicitly specified value is ignored anyway:

The above screen snapshot also demonstrates an advantage of running serialver against a class to generate the serialVersionUID rather than making up an arbitrary long value such as 42L. By using the script, we get the 0L result for all enums and improve our chances of remembering that enums all have 0L for this value and don’t need it explicitly specified.

Although it does not hurt anything to unnecessarily specify that an enum implements Serializable or to even provide an ignored serialVersionUID, I prefer not to include these. One might argue that at least adding “implements Serializable” communicates the intent to have an enum be Serializable, but my feeling is that this is a fundamental part of the language since J2SE 5 and such communication should be unnecessary. When building a class that needs to be Serializable, using enum constituent pieces can be treated just the same as using Strings and primitives and the reference types corresponding to primitives.

All of the details I demonstrated and explained in this blog posting related to Enums being inherently Serializable are concisely described in two paragraphs of Section 1.12 (“Serialization of Enum Constants”) of the Java Object Serialization Specification.

Additional Resources

Java Object Serialization Specification

Serialization of Enum Constants

Object Serialization: Frequently Asked Questions

Into the Mist of Serialization Myths

Flatten Your Objects: Discover the Secrets of the Java Serialization API

Java Serialization Algorithm Revealed

From Dustin's Software Development Cogitations and Speculations

25 Jun

Viewing Names Bound to RMI Registry

comments

When working with Java Remote Method Invocation (RMI), there are times when it is helpful to know which names are currently bound to a particular rmiregistry on a particular host/port combination. This is especially true when debugging problems related to getting an RMI client unable to connect to an RMI server either because the server cannot be found (NotBoundException) or because the server port is already bound to the provided name (AlreadyBoundException).

A simple Java application can be written that provides all named bindings for an RMI registry on a particular host and port. The simple application demonstrated in this posting takes advantage of standard Java classes such as LocateRegistry, Registry, and other classes and exceptions in the java.rmi and java.rmi.registry packages. The code for this application is shown next.

RmiPortNamesDisplay.java

package dustin.examples.rmi;

import java.rmi.ConnectException;import java.rmi.RemoteException;import java.rmi.registry.LocateRegistry;import java.rmi.registry.Registry;

/** * Display names bound to RMI registry on provided host and port. */public class RmiPortNamesDisplay{   private final static String NEW_LINE = System.getProperty("line.separator");

   /**    * Main executable function for printing out RMI registry names on provided    * host and port.    *    * @param arguments Command-line arguments; Two expected: first is a String    *    representing a host name ('localhost' works) and the second is an    *    integer representing the port.    */   public static void main(final String[] arguments)   {      if (arguments.length < 2)      {         System.err.println(            "A host name (String) and a port (Integer) must be provided.");         System.err.println(            "\tExample: java dustin.examples.rmi.RmiPortNamesDisplay localhost 1099");         System.exit(-2);      }

      final String host = arguments[0];      int port = 1099;      try      {         port = Integer.valueOf(arguments[1]);      }      catch (NumberFormatException numericFormatEx)      {         System.err.println(              "The provided port value [" + arguments[1] + "] is not an integer."            + NEW_LINE + numericFormatEx.toString());      }

      try      {         final Registry registry = LocateRegistry.getRegistry(host, port);         final String[] boundNames = registry.list();         System.out.println(            "Names bound to RMI registry at host " + host + " and port " + port + ":");         for (final String name : boundNames)         {            System.out.println("\t" + name);         }      }      catch (ConnectException connectEx)      {         System.err.println(              "ConnectionException - Are you certain an RMI registry is available at port "            + port + "?" + NEW_LINE + connectEx.toString());      }      catch (RemoteException remoteEx)      {         System.err.println("RemoteException encountered: " + remoteEx.toString());      }   }}

To test out the above application, I can start any service exposing an RMI interface. For this example, I have started a GlassFish domain as shown in the next screen snapshot.

The port on which GlassFish exposes its JMX RMI interface for management and monitoring is highlighted in the screen snapshot and is 8686. When I run the simple RMI port names display application shown above on the same host on which I ran GlassFish, I can use “localhost” as the host. When I run the above Java application, I see two bound names on the RMI registry on localhost at port 8686. This is shown in the next screen snapshot.

From the results shown in the above image, we see that GlassFish exposes two named services on port 8686: jmxrmi and management/rmi-jmx-connector.

The simple application shown above uses standard Java libraries and classes, but also has a “script” feel. It seems like what would really work well here is a script language that uses Java classes. In other words, a scripting language that runs on the JVM such as JRuby or Groovy seems like the perfect fit. With that in mind, the next code listing shows a Groovy implementation of the application written above in traditional Java.

rmiPortNamesDisplay.groovy

import java.rmi.ConnectExceptionimport java.rmi.RemoteExceptionimport java.rmi.registry.LocateRegistryimport java.rmi.registry.Registry

if (args.length < 2){   println "A host name (String) and a port (Integer) must be provided."   println "\tExample: groovy rmiPortNamesDisplay localhost 1099"   System.exit(-2)}

host = args[0]port = 1099try{   port = Integer.valueOf(args[1])}catch (NumberFormatException numericFormatEx){   println "The provided port value '${args[1]}' is not an integer."   System.exit(-1)}

registry = LocateRegistry.getRegistry(host, port)boundNames = registry.list()println "Names bound to RMI registry at host ${host} and port ${port}:"boundNames.each{println "\t${it}"}

The above Groovy script, like the Java application from which it was adapted, can be run on the command line. This is shown in the next screen snapshot.

Besides the obvious syntactic differences, one advantage of writing something like this in Groovy is that it does not require an explicit compilation step. The Java application above had to be compiled into a Java .class file first and then executed. With Groovy, this is all done implicitly so that to the user it just feels like running a text-based script. I am especially fond of using Groovy in cases like this where I wish to combine scripting features with accessibility to the JVM and standard Java libraries.

Conclusion

When working with RMI, there are times when it is important to know which named services are already bound to an RMI registry at a given host and port. This blog posting has demonstrated use of the LocateRegistry, Registry, and other relevant classes to do this in a fairly easy manner with traditional Java and with Groovy. It is a straightforward process to extend the Java application and Groovy script shown in this blog posting to cover multiple hosts and ports to, in effect, search for RMI registered names.

From Dustin's Software Development Cogitations and Speculations

25 Jun

The iPhone Software Revolution

comments

The original iPhone was for suckers hard-core gadget enthusiasts only. But as I predicted, 12 months later, the iPhone 3G rectified all the shortcomings of the first version. And now, with the iPhone 3GS, we’ve reached the mythical third version:

A computer industry adage is that Microsoft does not make a successful product until version 3. Its Windows operating system was not a big success until the third version was introduced in 1990 and, similarly, its Internet Explorer browsing software was lackluster until the third version.

The platform is now so compelling and polished that even I took the plunge. For context, this is the first Apple product I’ve owned since 1984. Literally.

I am largely ambivalent towards Apple, but it’s impossible to be ambivalent about the iPhone — and in particular, the latest and greatest iPhone 3GS. It is the Pentium to the 486 of the iPhone 3G. A landmark, genre-defining product, no longer a mere smartphone but an honest to God fully capable, no-compromises computer in the palm of your hand.

Here’s how far I am willing to go: I believe the iPhone will ultimately be judged a more important product than the original Apple Macintosh.

iphone3gs1.jpg

Yes, I am dead serious. Just check back here in fifteen to twenty years to see if I was right. (Hint: I will be.)

There’s always been a weird tension in Apple’s computer designs, because they attempt to control every nuance of the entire experience from end to end. For the best Appletm experience, you run custom Appletm applications on artfully designed Appletm hardware dongles. That’s fundamentally at odds with the classic hacker mentality that birthed the general purpose computer. You can see it in the wild west, anything goes Linux ecosystem. You can even see it in the Wintel axis of evil, where a million motley mixtures of hardware, software, and operating system variants are allowed to bloom, like little beige stickered flowers, for a price.

But a cell phone? It’s a closed ecosystem, by definition, running on a proprietary network. By a status quo of incompetent megacorporations who wouldn’t know user friendliness or good design if it ran up behind them and bit them in the rear end of their expensive, tailored suits. All those things that bugged me about Apple’s computers are utter non-issues in the phone market. Proprietary handset? So is every other handset. Locked in to a single vendor? Everyone signs a multi-year contract. One company controlling your entire experience? That’s how it’s always been done. Nokia, Sony/Ericsson, Microsoft, RIM — these guys clearly had no idea what they were in for when Apple set their sights on the cell phone market — a market that is a nearly perfect match to Apple’s strengths.

Apple was born to make a kick-ass phone. And with the lead they have, I predict they will dominate the market for years to come.

Consider all the myriad devices that the iPhone 3GS can sub for, and in some cases, outright replace:

  • GPS
  • Netbook (for casual web browsing and email)
  • Gameboy
  • Watch
  • Camera
  • MP4 Video Recorder
  • MP3 player
  • DVD player
  • eBook reader

Oh yeah, and I heard you can make phone calls with it, too. Like any general purpose computer, it’s a jack of all trades.

As impressive as the new hardware is, the software story is even bigger. If you’re a software developer, the iPhone can become a career changing device, all thanks to one little teeny-tiny icon on the iPhone home screen:

app_store.jpg

The App Store makes it brainlessly easy to install, upgrade, and purchase new applications. But more importantly, any software developer — at the mild entry cost of owning a Mac, and signing up for the $99 iPhone Developer Program — can build an app and sell it to the worldwide audience of iPhone users. Apple makes this stuff look easy, when historically it has been anything but. How many successful garage developers do you know for Nintendo DS? For the Motorola Razr? For Palm? For Windows Mobile?

Apple has never been particularly great at supporting software developers, but I have to give them their due: with the iPhone developer program, they’ve changed the game. Nowhere is this more evident than in software pricing. I went on a software buying spree when I picked up my iPhone 3GS, ending up with almost three pages of new applications from the App Store. I was a little worried that I might rack up a substantial bill, but how can I resist when cool stuff like ports of the classic Amiga Pinball Dreams are available, or the historic Guru Meditation? The list of useful (and useless) apps is almost endless, and growing every day.

My total bill for 3 screens worth of great iPhone software applications? About fifty bucks. I’ve paid more than that for Xbox 360 games I ended up playing for a total of maybe three hours! About half of the apps were free, and the rest were a few bucks. I think the most I paid was $9.99, and that was for an entire library. What’s revolutionary here isn’t just the development ecosystem, but the economics that support it, too. At these crazy low prices, why not fill your phone with cool and useful apps? You might wonder if developers can really make a living selling apps that only cost 99 cents. Sure you can, if you sell hundreds of thousands of copies:

Freeverse, one of the leading developers and publishers of iPhone games, sold the millionth copy of its Flick Fishing game over the weekend, making Flick Fishing the first paid application to reach the one million download milestone. Flick Fishing, which costs 99 cents, allows iPhone and iPod touch users to take a virtual fishing trip with the flick of a wrist. The game uses the iPhone’s accelerometer to recreate a casting motion, then a combination of bait choice and fishing skill helps players land the big fish.

Preliminary weekly reports for the period from 23 March to 19 April indicate that Flight Control sold a total of 587,485 units during this time. We estimate total sales are now over 700,000 units, with the bulk of sales occurring in a 3 week period.
Flight Control

That’s an honorable way to get rich programming, and a nice business alternative to the dog-eat-dog world of advertising subsidized apps.

I love nothing more than supporting my fellow software developers by voting with my wallet. it does my heart good to see so many indie and garage developers making it big on the iPhone. (Also, I’m a sucker for physics games, and there are a bunch of great ones available in the App Store). I’m more than happy to pitch in a few bucks every month for a great new iPhone app.

If this has all come across as too rah-rah, too uncritical a view of the iPhone, I apologize. There are certainly things to be critical about, such as the App Store’s weird enforcement policies, the lack of support for emulators, or Flash, or anything else that might somehow undermine the platform as decided in some paranoid, secretive Apple back room. Not that we’d ever hear about it.

I didn’t write this to kiss Apple’s ass. I wrote this because I truly feel that the iPhone is a key inflection point in software development. We will look back on this as the time when “software” stopped being something that geeks buy (or worse, bootleg), and started being something that everyone buys, every day. You’d have to be a jaded developer indeed not to find something magical and transformative in this formula, and although others will clearly follow, the iPhone is leading the way.

“There’s an app for that.” Kudos, Apple. From the bottom of my hoary old software developer heart.

[advertisement] Interested in agile? See how a world-leading software vendor is practicing agile.

From Coding Horror

24 Jun

Thread Analysis with VisualVM

comments

Although jstack (Java Stack Trace) is a useful tool for learning more about a how a Java thread is behaving, VisualVM is an even easier method for obtaining the same type of information.

It is easy to run jstack as demonstrated in the next screen snapshot:

Only the top portion of the generated stack trace information is shown above. The output of an entire jstack run looks similar to that which follows:

2009-06-24 23:25:26Full thread dump Java HotSpot(TM) Client VM (14.0-b16 mixed mode, sharing):

"RMI Scheduler(0)" daemon prio=6 tid=0x04bb4c00 nid=0x126c waiting on condition [0x048ef000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x2492dac8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.DelayQueue.take(DelayQueue.java:160) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:583) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:576) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619)

   Locked ownable synchronizers: - None

"RMI TCP Accept-0" daemon prio=6 tid=0x023bfc00 nid=0xa74 runnable [0x0483f000]   java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390) - locked <0x2492dc48> (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341) at java.lang.Thread.run(Thread.java:619)

   Locked ownable synchronizers: - None

"Low Memory Detector" daemon prio=6 tid=0x02348000 nid=0xec4 runnable [0x00000000]   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers: - None

"CompilerThread0" daemon prio=10 tid=0x02343000 nid=0x108 waiting on condition [0x00000000]   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers: - None

"Attach Listener" daemon prio=10 tid=0x02342800 nid=0xf6c waiting on condition [0x00000000]   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers: - None

"Signal Dispatcher" daemon prio=10 tid=0x02339c00 nid=0xac runnable [0x00000000]   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers: - None

"Finalizer" daemon prio=8 tid=0x022f4000 nid=0xd14 in Object.wait() [0x044cf000]   java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x248ee9a0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x248ee9a0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

   Locked ownable synchronizers: - None

"Reference Handler" daemon prio=10 tid=0x022f2c00 nid=0x1198 in Object.wait() [0x0240f000]   java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x248eea28> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x248eea28> (a java.lang.ref.Reference$Lock)

   Locked ownable synchronizers: - None

"main" prio=6 tid=0x00209000 nid=0x1134 runnable [0x002af000]   java.lang.Thread.State: RUNNABLE at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) at java.lang.StringBuilder.append(StringBuilder.java:119) at dustin.examples.tools.AnalyzableImpl.loopProvidedNumberOfTimes(AnalyzableImpl.java:48) at dustin.examples.tools.AnalyzableImpl.main(AnalyzableImpl.java:80)

   Locked ownable synchronizers: - None

"VM Thread" prio=10 tid=0x022f1400 nid=0x10c runnable 

"VM Periodic Task Thread" prio=10 tid=0x02348c00 nid=0x278 waiting on condition 

JNI global references: 872

VisualVM makes it easy to monitor application threads. VisualVM offers the capability to generate and view the jstack-generated stack trace. The first way to do this is to right-click on the appropriate Java process and select the option “Thread Dump.” This will generate a thread dump file whose name appears under the selected Java process as shown in the following screen snapshot.

A second way to get the thread dump generated in VisualVM is to use the “Threads” tab and click on the button “Thread Dump.” This button is demonstrated in the next screen snapshot.

Whether the right-click option is used or the “Thread Dump” button is pressed, VisualVM generates jstack thread dump output file and displays it as shown in the next screen snapshot.

As discussed and demonstrated, VisualVM allows for easy generation of a stack trace dump with jstack. However, VisualVM provides much more than that for thread analysis.

The screen snapshot above that showed the “Thread Dump” button also conveniently demonstrates VisualVM’s Timeline tab that demonstrates in a live fashion the “live” threads. These colored horizontal bars represent individual threads. Their display is enabled because the “Threads visualization” checkbox is checked.

Another useful tab on VisualVM is the “Threads” “Table” view that provides textual overview information on the threads. This is demonstrated in the next screen snapshot.

Any individual thread can be clicked on to see the detailed Threads view. This view, also under the “Threads” tab in VisualVM. When “Threads visualization” is checked, a pie chart is included that graphically indicates what each thread is doing. This is demonstrated in the next screen snapshot.

Conclusion

The jstack tool is a useful command-line tool, but VisualVM provides the same capabilities along with a significantly improved presentation and details.

From Dustin's Software Development Cogitations and Speculations

24 Jun

Heap Dump and Analysis with VisualVM

comments

In previous blog posts, I have covered using VisualVM to acquire HotSpot JVM runtime information in a manner similar to jinfo and how to use VisualVM in conjunction with JMX and MBeans in a manner similar to JConsole. This blog posting looks at how VisualVM can be used to generate and analyze a heap dump in a manner similar to that done with command-line tools jmap and jhat.

The jmap (Java Memory Map) tool is one of several ways that a Java heap dump can be generated. The Java Heap Analysis Tool (jhat) TechNotes/man page lists four methods for generating a heap dump that can be analyzed by jhat. The four listed methods for generating a heap dump are the use of jmap, JConsole (Java Monitoring and Management Console), HPROF, and when an OutOfMemoryError occurs when the -XX:+HeapDumpOnOutOfMemoryError VM option has been specified. A fifth approach that is not listed, but is easy to use, is Java VisualVM. (By the way, another method is use of the MXBean called HotSpotDiagnosticMXBean and its dumpHeap(String,Boolean) method.)

The jmap tool is simple to use from the command line to produce a heap dump. It can be used against a running Java process whose piocess ID (pid) is known (available via jps) or against a core file. In this post, I’ll focus on using jmap with a running process’s ID.

The jmap page states that jmap is an experimental tool with relatively limited capabilities on Windows that may not be available with future versions of the JDK. This page also lists options available to specify how jmap should generate a heap dump.

The following screen snapshot shows how jmap can be used to dump a heap.

The generated dump file, dustin.bin in this case, is binary as shown in the next screen snapshot.

The binary heap dump can be read with the jhat tool. Sun’s Java SE 6 included implementation of jhat replaces HAT, which was formerly available as a separate download. It is almost trivial to run jhat. One need only invoke jhat on the heap dump file generated with jmap (or alternative dump generation technique) as shown in the next screen snapshot.

With the heap dump generated (jmap) and the jhat tool invoked, the dump can be analyzed with a web browser. The output on the console tells us that the dump is available on port 7000 (this default port can be overridden with the -port option). When I run the browser on the same machine on which I ran jhat, I can use localhost for the host portion of the URL. The starting page using localhost and port 7000 is shown in the next screen snapshot.

Arbitrary Object Query Language (OQL) statements can be written to find necessary details in the heap dump. The jhat-started web server includes OQL help at the URL http://localhost:7000/oqlhelp/. See also Querying Java Heap with OQL for more details on how to use OQL. However, one can often find what one needs simply using the already provided information and moving between pieces of information using the provided hyperlinks.

The following screen snapshot demonstrates one of the more useful pages available thanks to jhat’s web server-based output of the heap dump. This page shows the number of instances of various Java objects, including platform objects.

A significant aid in understanding what these web pages generated by jhat mean is the VM Specification on Class File Format. In Section 4.3.2 (“Field Descriptors”) of this document, there is a table that shows the mapping of field descriptor characters to the data type we use. According to this table, “B” indicates a byte, “C” indicates a char, “D” indicates a double, “F” indicates a float, “I” indicates an integer, “J” indicates a long, “L<someClassName>” indicates a reference (instance of a class), “Z” indicates a boolean, and [ indicates an array.

So far, I have looked at using jmap and jhat from the command-line to generate a heap dump and provide a web browser-based method for analyzing the generated heap dump. Although these tools are relatively easy to use, VisualVM provides similar functionality in an even easier approach.

One method for generating a heap dump in Visual VM is to simply right click on the desired process and select “Heap Dump”. This method is shown in the next screen snapshot.

This generates the heap dump as indicated by its name underneath the Java process.

A second approach for generating a heap dump with VisualVM is to click on the Java process of interest so that relevant tabs (“Overview”, “Monitor”, “Threads”, and “Profiler”) come up in VisualVM. Selecting the “Monitor” tab provides the “Heap Dump” button as shown in the next screen snapshot.

Clicking on the “Heap Dump” button leads to a heap dump being generated just as it was with the right click option described above. This is shown in the next screen snapshot, which happens in this case to show the “Summary” tab of the analyzed heap dump.

In addition to the “Summary” tab of the heap dump analysis, other interesting details from the heap dump are presented in the “Class” tab. This tab includes horizontal bar charts that graphically indicate the percentage of total instances that are associated with each class. An example is shown in the next screen snapshot.

The displayed classes are spelled out rather than using symbols like those described above for jhat-based heap dump analysis. One can right-click on any class in the “Classes” tab and select “Show in Instances View” to see details on each individual instance of the selected class. This is shown in the next screen snapshot.

Conclusion

VisualVM provides several advantages when creating and analyzing heap dumps. First, everything from creation to analysis is in one place. Second, the data is provided in what may be considered a more presentable format with graphical support. Finally, other tools can also be used in VisualVM in conjunction with the heap dump analysis. VisualVM provides one-stop shopping for many of the development, debugging, and performance analysis needs of the Java developer.

Additional References

Troubleshooting Java SE

Troubleshooting Guide for Java SE 6 with HotSpot JVM (PDF)

Java SE 6 Performance White Paper

What’s in My Java Heap?

Analyzing Java Heaps with jmap and jhat

Java Memory Profiling with jmap and jhat

From Dustin's Software Development Cogitations and Speculations

24 Jun

Scaling Up vs. Scaling Out: Hidden Costs

comments

In My Scaling Hero, I described the amazing scaling story of plentyoffish.com. It’s impressive by any measure, but also particularly relevant to us because we’re on the Microsoft stack, too. I was intrigued when Markus posted this recent update:

Last monday we upgraded our core database server after a power outage knocked the site offline. I haven’t touched this machine since 2005 so it was a major undertaking to do it last minute. We upgraded from a machine with 64 GB of ram and 8 CPUs to a HP ProLiant DL785 with 512 GB of ram and 32 CPUs

The HP ProLiant DL785 G5 starts at $16,999 — and that’s barebones, with nothing inside. Fully configured, as Markus describes, it’s kind of a monster:

  • 7U size (a typical server is 2U, and mainstream servers are often 1U)
  • 8 CPU sockets
  • 64 memory sockets
  • 16 drive bays
  • 11 expansion slots
  • 6 power supplies

It’s unclear if they bought it pre-configured, or added the disks, CPUs, and memory themselves. The most expensive configuration shown on the HP website is $37,398 and that includes only 4 processors, no drives, and a paltry 32 GB memory. When topped out with ultra-expensive 8 GB memory DIMMs, 8 high end Opterons, 10,000 RPM hard drives, and everything else — by my estimates, it probably cost closer to $100,000. That might even be a lowball number, considering that the DL785 submitted to the TPC benchmark website (pdf) had a “system cost” of $186,700. And that machine only had 256 GB of RAM. (But, to be fair, that total included another major storage array, and a bunch of software.)

At any rate, let’s assume $100,000 is a reasonable ballpark for the monster server Markus purchased. It is the very definition of scaling up — a seriously big iron single server.

But what if you scaled out, instead — Hadoop or MapReduce style, across lots and lots of inexpensive servers? After some initial configuration bumps, I’ve been happy with the inexpensive Lenovo ThinkServer RS110 servers we use. They’re no match for that DL785 — but they aren’t exactly chopped liver, either:

Lenovo ThinkServer RS110 barebones $600
8 GB RAM $100
2 x eBay drive brackets $50
2 x 500 GB SATA hard drives, mirrored $100
Intel Xeon X3360 2.83 GHz quad-core CPU $300

Grand total of $1,150 per server. Plus another 10 percent for tax, shipping, and so forth. I replace the bundled CPU and memory that the server ships with, and then resell the salvaged parts on eBay for about $100 — so let’s call the total price per server $1,200.

Now, assuming a fixed spend of $100,000, we could build 83 of those 1U servers. Let’s compare what we end up with for our money:

 

Scaling Up

Scaling Out
CPUs

32

332
RAM

512 GB

664 GB
Disk

4 TB

40.5 TB

Now which approach makes more sense?

(These numbers are a bit skewed because that DL785 is at the absolute extreme end of the big iron spectrum. You pay a hefty premium for fully maxxing out. It is possible to build a slightly less powerful server with far better bang for the buck.)

But there’s something else to consider: software licensing.

 

Scaling Up

Scaling Out
OS

$2,310

$33,200*
SQL

$8,318

$49,800*

(If you’re using all open source software, then of course these costs will be very close to zero. We’re assuming a Microsoft shop here, with the necessary licenses for Windows Server 2008 and SQL Server 2008.)

Now which approach makes more sense?

What about the power costs? Electricity and rack space isn’t free.

 

Scaling Up

Scaling Out
Peak Watts

1,200w

16,600w
Power Cost / Year

$1,577

$21,815

Now which approach makes more sense?

I’m not picking favorites. This is presented as food for thought. There are at least a dozen other factors you’d want to consider depending on the particulars of your situation. Scaling up and scaling out are both viable solutions, depending on what problem you’re trying to solve, and what resources (financial, software, and otherwise) you have at hand.

That said, I think it’s fair to conclude that scaling out is only frictionless when you use open source software. Otherwise, you’re in a bit of a conundrum: scaling up means paying less for licenses and a lot more for hardware, while scaling out means paying less for the hardware, and a whole lot more for licenses.

* I have no idea if these are the right prices for Windows Server 2008 and SQL Server 2008, because reading about the licensing models makes my brain hurt. If anything, it could be substantially more.

[advertisement] Interested in agile? See how a world-leading software vendor is practicing agile.

From Coding Horror

23 Jun

JMX 2 Postponed Until Java SE 8

comments

It was disappointing, but not altogether surprising, to learn that JMX 2 will not be part of Java SE 7. Anyone who saw my Colorado Software Summit 2008 presentation JMX Circa 2008 is aware of how excited I was for some of the new features that were tentatively planned for Java SE 7. Java Management Extensions (JMX) is already a highly useful technology, but the advancements in JMX hoped for in Java SE 7 are convenient and welcome. With the long period of time between Java SE 6 and Java SE 7, it likely means quite a wait for JMX 2 features that are now expected with Java SE 8.

From Dustin's Software Development Cogitations and Speculations

Categories