`Rinehart on Java Conventions

Rinehart on Java Coding Conventions

Introduction

In my paper "Java Coding Conventions" I discussed Sun's Java coding conventions and four other serious conventions. I took pains to present all five positions accurately and to avoid injecting my own opinions and biases. At times, that was not easy. This paper entirely reverses that effort at neutrality. It applauds and pans the existing conventions. Where there is disagreement, I take sides. Where there is agreement, I reserve the right to disagree.

Is this merely one man's opinion? Sure. But that one man has written Java since Java 1.0. He's written in other programming languages since 1965. Hopefully, he's learned a bit along the way. At the very least, he's earned the right to be crotchety.


Java Coding Conventions

© 2005, Martin Rinehart
from www.MartinRinehart.com


Table of Contents

Introduction
Why So Many Conventions?
References to Conventions
Comment Types
Organization
 
1 General
1.1 Conventions About Conventions
1.2 Language
1.3 Implementation Comments
1.4 Acronyms
1.5 Other
 
2 Package
2.1 Package Names
2.2 Package Comments
 
3 Compilation Unit
3.1 Compilation Unit Names
3.2 Compilation Unit Comments
3.3 Compilation Unit Format
 3.3 Compilation Unit Format (cont.)
    3.3.1 Compilation Unit Sections
    3.3.2 Package and Import Statements
    3.3.3 Line Length and Breaks
    3.3.4 Indentation
3.4 Compilation Unit Other
 
4 Type (Class and Interface)
4.1 Type Names
4.2 Type Comments
4.3 Type Format
    4.3.1 Format of the Type Statement
    4.3.2 Items Within the Type
    4.3.3 Order of Methods
4.4 Type Other
 
5 Class-Wide Data
5.1 Class-Wide Data Names
5.2 Class-Wide Data Comments
5.3 Class-Wide Data Format
5.4 Class-Wide Data Other
6 Methods
6.1 Method Names
6.2 Method Comments
6.3 Method Format
6.4 Method Other
 
7 Statements Within Methods
7.1 Variable Names
7.2 Statement Comments
7.3 Statement Formats
    7.3.1 Agreed Statement Conventions
    7.3.2 Formatting the Ternary Expression
    7.3.3 Declarations Formats
    7.3.4 Block Statement Formats
    7.3.5 Switch Formats
    7.3.6 Try/Catch/Finally Formats
    7.3.7 Miscellaneous Formatting
7.4 Statement Other
 
Appendix — Convention Coverage Map


Introduction

Sun has provided coding conventions for Java programmers (reproduced here at Code Conventions for the Java™ Programming Language and available at Sun). It would seem that an organization interested in coding conventions should adopt these and add whatever extensions are appropriate to its unique needs. It would be nice if life were so simple. Where I have comments to make, I'll add them by splitting the paper in half. The original will be on the left, and my comments will be here on the right.

If I don't comment, you may infer that I am in agreement. I'm not bashful.

Contents
1 General
2 Package
3 Compilation Unit
4 Type (Class and Interface)
5 Class-Wide Data
6 Methods
7 Statements Within Methods
Conventions Map

Why So Many Other Conventions?

Given the existence of Sun's conventions, why has an extensive crop of other conventions grown in the Java community? There are two main reasons: first, Sun's conventions are far from exhaustive and second, in some cases the other standards just disagree with Sun. In this article I'll cover the Sun conventions and four other thoughtful conventions. We'll see where they agree, where they fill in each others' gaps and where they disagree.

In alphabetical order, the non-Sun conventions are:

References to Conventions

Sun's document is reproduced here with fresh HTML so you can link directly to, for example, (6 Declarations — 6.1 Number Per Line) [Sun 6.1].

Ambler numbers sections (6 Classes, Interfaces, ... — 6.1 Classes — 6.1.1 Class Visibility). A reference to class visibility in Ambler is [Ambler 6.1.1].

Caltech does not provide a comparable navigation feature but it does provide a convenient, hyper-linked table of contents at the top. The reference to a naming convention will be [Caltech Naming Conventions].

Geosoft's organization is similar to Sun and Ambler (6 Layout and Comments — 6.1 Layout) but it numbers each convention consecutively. The first convention under 6.1 is "58 Basic indentation ...". This will be referred to as [Geosoft 58].

The New England Java User Group proposes Standards (must be followed — STD), Styles (should be followed, but some room for decision — STY) and Conventions (no recommendation, but make up your own mind and stick to your decision — CON). Style 13, Ternary Statement Usage, will be referred to as [NEJUG STY-13].

I have tried to be complete. For example, 2.2 Package Comments includes references to Ambler and to Caltech. This implies that the other three conventions do not address this issue.

Comment Types

In this document comments delimited by /* ... */ are called "block comments" while comments that run from their start to the end of their line // ... are called EOL comments. Some writers call block comments "C-style" comments and the EOL comment is called a "C++-style" comment. A comment block — multiple lines of comment in the code — may be enclosed in a block comment or each line may be an EOL comment.

Organization

The logic underlying Sun's organization was not apparent to this author. The other conventions attempt to improve on this with varying degrees of success. Here I'm going back to the basic organization of java itself: In point of fact, I didn't think there was any underlying logic.
  1. General
  2. Package
  3. Compilation Unit
  4. Type (Class and Interface)
  5. Class-Wide Data
  6. Methods
  7. Statements Within Methods
The General category covers items applicable to most or all of the other topics. For example, what language should we use for names and comments?

This paper doesn't cover "good advice" conventions that programmers using any language should know (choose meaningful variable names, for example). This paper also excludes requirements (the compiler reads files with a ".java" suffix). It focuses on choices the programmer needs to make. This document is not a substitute for the referenced documents — they include many useful examples and discussions of the options. The goal here is a terse summary that lets you see where the standards agree, disagree and are silent.

1 General

  1.1 Conventions About Conventions

[Ambler 1.2] and [Geosoft 1] agree that violation of conventions is acceptable if the violation improves readability. Ambler insists that any violation must be documented.

Ambler's Law ([Ambler 1.5]) states that conventions are preferred in order of increasing generality. Industry conventions are better than organization conventions; organization conventions are better than project conventions, which in turn are better than personal conventions, which are better than no conventions.

I agree with both. The "violation" should be documented in cases where that documentation would be clarifying. Perhaps the "violation" suggests a need for improvement in the standard?

Ambler's Law is spot on.

Contents
1 General
1.1 Conventions About Conventions
1.2 Language
1.3 Implementation Comments
1.4 Acronyms
1.5 Other
2 Package
3 Compilation Unit
4 Type (Class and Interface)
5 Class-Wide Data
6 Methods
7 Statements Within Methods
Conventions Map

  1.2 Language

What is the appropriate language for names and comments? Geosoft (a Norwegian company) specifies that both should be written in English [Geosoft 10, 77] as English is the international programming language. [Ambler 1.4] specifies the use of the writer's native language for names. (The other three conventions would have been noted here if they were not silent on this point.) Polar opposite opinions, here. First, if your native language is English, there is no debate. This is a decision for those whose native language is not English. I side with the Norwegians. Your code may be maintained by your fellow countrymen, or maintenance may be outsourced to India, for example.

As a native English speaker, I realize that this must sound most unfair. Frankly, it doesn't just sound unfair, it is unfair. However, for better or worse, English has become the international language. Globalization may not always be to our liking, but it is certainly a fact.

  1.3 Implementation Comments

Comments are divided into documentation (javadoc) and implementation (how and why) comments. [NEJUG STD-7] declares implementation comments are required. [Sun 5.1] encourages the use of implementation comments throughout the code. [Sun 5] and [Ambler 1.4] agree that comments should not be enclosed in banners.

  1.4 Acronyms

For names written in UpperAndLowercase (types), or lowerAndUppercase (methods and variables), acronyms should be capitalized as if they were words (call a sqlMethod(), create anHtmlPage) according to [Ambler 1.3] and [NEJUG STD-2].

  1.5 Other

Other NEJUG general conventions include being consistent within a file (if the style isn't written per your conventions, either change it all or make changes in the file conforming to the file's conventions) — [NEJUG STD-8]. Prefixes indicating variable scope are suggested as an option in [NEJUG STY-27]. Methods and classes that are just roughed in for future use are discouraged by [NEJUG CON-10]. The Error class should not be subclassed ([NEJUG CON-19]); checked and unchecked Exceptions should be distinguished ([NEJUG CON-20]) and Exceptions thrown should be enhanced with additional data ([NEJUG CON-21]).

[Caltech Structure] recommends that javadoc comments end with **/. The other standards do not specifically address this but all their examples show javadoc ending with */.

No convention specifies that javadoc comments have empty top and bottom lines, but all examples are done this way:

    /**
     * doc comment here
     */
Pet peeve. One of the resources that is still scarce is vertical space. My code editor shows about 60 lines. The more code I can see at once, the happier I am. Therefore, I don't waste vertical space. My standard is:
 /** doc comment here */ 

Contents
1 General
2 Package
2.1 Package Names
2.2 Package Comments
3 Compilation Unit
4 Type (Class and Interface)
5 Class-Wide Data
6 Methods
7 Statements Within Methods
Conventions Map

2 Package

  2.1 Package Names

Except Ambler, all conventions ([Sun 9.1], [Caltech Naming Conventions], [Geosoft 2], [NEJUG STD-1]) specify that package names will be all lowercase. [Sun 3.1.2] specifies a reversed order URL (my packages would be com.martinrinehart.xxx) which guarantees uniqueness but may be cumbersome and certainly can't be followed by those who do not have a site of their own. [Caltech Structure, Naming Conventions] and [NEJUG STD-1] fully support Sun. [Ambler 6.3.1] and [Geosoft 2] do not require use of a reversed URL for packages not intended for distribution to others. Ambler and NEJUG point out that "java" and "javax" are reserved by Sun. Ambler states that package names should be singular, not plural.

  2.2 Package Comments

[Ambler 6.3.2] specifies that an external document — named "[package.name].html" — provide the rationale for the package and a list of classes. [Caltech Structure] requires package documentation in a file named "package.html". Ambler argues that multiple files with the same name, even though in different directories, will eventually be a source of trouble. Caltech's recommendation takes advantage of the Javadoc tool's ability to incorporate "package.html" into the API documentation. Javadoc includes a list of classes taken from the source files in the package directory. I share Ambler's opinion about multiple files with the same name. It's a problem waiting to happen. However, it's what Sun's javadoc tool requires. I don't think that was a good design decision, but until Sun does something better I side with Caltech.

Contents
1 General
2 Package
3 Compilation Unit
3.1 Compilation Unit Names
3.2 Compilation Unit Comments
3.3 Compilation Unit Format
    3.3.1 Compilation Unit Sections
    3.3.2 Package and Import Statements
    3.3.3 Line Length and Breaks
    3.3.4 Indentation
3.4 Compilation Unit Other
4 Type (Class and Interface)
5 Class-Wide Data
6 Methods
7 Statements Within Methods
Conventions Map

3 Compilation Unit

  3.1 Compilation Unit Names

(Requirements enforced by the compiler — name the unit the same as the name of the public type with a ".java" suffix — are not considered conventions.)

  3.2 Compilation Unit Comments

[Sun 3.1.1] specifies that the file (more correctly, compilation unit, but to date remains commonly a single file) begin with a block comment specifying the classname, version, date and copyright notice. [Caltech Structure] similarly specifies a block comment specifying project, copyright, version control ID (for its version-control system), list of classes (if more than one) and principal entry point, if any. [Ambler 6.4.2] also includes an optional file name, copyright notice and list of classes.

[Sun 3] also suggests an optional comment identifying the sections of the file.

Embedded HTML tags in comments are specified by [Caltech Documentation] in an effort to make intra-file navigation as simple as Javadoc navigation. (Caltech does not claim to have solved the problem, only to be working on a solution.)

  3.3 Compilation Unit Format

There is general agreement that less is more when it comes to the types in a single compilation unit. [Sun 3.1] specifies a single public type followed by zero or more private ones. [Caltech Structure] specifies one type per file except for pure "one-shot" (supporting the main type, never a candidate for independence) classes. Similarly, [Geosoft 30] allows just one type plus associated inner classes. Some time ago I looked at my own inner classes most critically. They almost all proved how "clever" the programmer was. Code that proves the author is "clever" is bad, bad code. It should be rewritten in a way that is not one little bit clever.

I have not, since coming to that realization, used an inner class. I have not missed them. My code, when it's good code, does obvious things in obvious ways, without any trace of cleverness.

    3.3.1 Compilation Unit Sections

[Sun 3] specifies that the sections of the file are separated by blank lines; [Sun 8.1] specifies that exactly two blank lines should be used between the types. [NEJUG STY-28] concurs. By inference, separate types are "sections" of the compilation unit. It is not clear if divisions within a type (static variables, instance variables, constructors and other methods) constitute separate sections.

The order within a file is beginning comments, package and import statements, then class and interface declarations, per [Sun 3.1.1] and [NEJUG STY-1]. The public type comes before any others.

If you swear off inner classes, as I have, there is only one type in each compilation unit, so there is no debate about how to separate types.

The subject of order within compilation units is, unfortunately, not part of these standards. It should be. Having one order makes it easy to find things. My order within the type is:

  • type-wide data: public, then protected, last private
  • main (often used for testing, typically commented out)
  • constructor(s): fewest to most parameters
  • methods: getters and setters, then other publics, then protected, last private
  • to_string() - which I always code, but put last as it's typically unimportant and uninteresting
Within any category (private data, for instance), items are ordered alphabetically. My order has the virtue of being ordered. If some other order were widely adopted I would happily change.

    3.3.2 Package and Import Statements

[Geosoft 35] specifies that the package statement must be present (forbidding the use of the default package).

The imports should use "*" (as in java.awt.*) according to [Ambler 7.2], and shouldn't use "*", according to [Ambler 7.2.1] and [Caltech Structure]. [NEJUG STY-3] considers both positions and concludes that "*" should be used for standard (Sun) classes but not for your own classes. The order of imports is java.xxx first, javax.xxx next and then others alphabetically according to [Caltech Structure] and [Geosoft 36]. [NEJUG STY-2] suggests that third-party packages be imported between Sun's and your own. [Sun 3.1.2] only says that imports follow the package statement. A hot-key in my IDE searches my code and then neatly lays out imports for exactly the types that need importing, and no more. Since my IDE can be downloaded without cost, this rather exacting (and formerly time-consuming) standard should be universal.

    3.3.3 Line Length and Breaks

Line length should be limited to 80 characters according to [Sun 4.1], [Geosoft 32] and [NEJUG STY-6]. [Caltech Structure] specifies 78 characters. Sun specifies 70 characters for documentation. With modern tools, I'm not sure I see why this matters.

Over-length lines must be broken. The broken part is indented ([Sun 4.1], [Ambler 2.4.3], [Geosoft 34] and [NEJUG STY-8]). Breaks are after commas and operators (except Sun, which specifies breaking before binary operators). Geosoft and NEJUG strongly disagree, stating that the "broken" condition of the line should be made as obvious as possible — which means the break is after the operator. Re breaking after binary operators, Geosoft and NEJUG are right.

When lines are broken, the parts should be the largest possible lexical units. For example:
Preferred Inferior

        var_name =
            complete + expression;

        var_name = broken +
            expression;

    3.3.4 Indentation

While the need for indenting is universally noted, the exact nature of indenting is:
  • 4 spaces, with tabs set every 8 spaces [Sun 4]
  • Using a tab is the only sensible decision [Ambler 2.4.2 footnote 2]
  • 2 spaces and tab characters are not allowed [Caltech Structure]
  • 2 spaces and neither tabs nor page breaks are allowed [Geosoft 33, 58]
  • 2, 3, 4 or 8 spaces (pick one and stick to it) and tabs are not allowed [NEJUG STY-9, 10].
The argument against tabs is that if your tab stops are set differently than mine, we won't see the source code the same way.
Ambler's almost right, that using a tab is the only sensible decision, but he's only right about typing. Sun's just silly—they must have a screwball code editor. The other three require spaces, not tabs, for the sensible reason that the display of the source code in your editor will be the same as the display was in the programmer's editor, regardless of your tab setting. So get a good program editor, set tabs to four spaces and tell it to save as spaces, not tabs. (If you don't have that feature, go back to "get a good program editor.")

I've occasionally gone less, but only under special circumstances. I once wrote for a magazine that had a 52-character limit to keep the code within its column width. So I used sub-standard tabs. That demonstrates how special conditions can force changes in otherwise sensible conventions. Four is a common, sensible tab width.

  3.4 Compilation Unit Other

[Sun 3] and [NEJUG CON-5] say that the file should not exceed 2,000 lines. I'm not proposing a standard, here, this is just a personal observation. I'm generally happy with my classes until somewhere around 1000 lines. Then they start to feel big and navigation, even with a good IDE, starts to get cumbersome. When that happens I ask, "How much more code do I need?" If the class is nearly complete, I'll just get it done. Otherwise, it's time for seriously thinking about somehow breaking the structure into smaller pieces.

Contents
1 General
2 Package
3 Compilation Unit
4 Type (Class and Interface)
4.1 Type Names
4.2 Type Comments
4.3 Type Format
    4.3.1 Format of the Type Statement
    4.3.2 Items Within the Type
    4.3.3 Order of Methods
4.4 Type Other
5 Class-Wide Data
6 Methods
7 Statements Within Methods
Conventions Map

4 Type (Class and Interface)

  4.1 Type Names

Type names should be written using the UpperAndLower convention ([Sun 9.2], [Ambler 1.3, 6.1.2], [Geosoft 3], and [NEJUG STD-2]). Class names should be nouns ([Sun 9.2], [Ambler 6.1.2], [Caltech Naming Conventions] and [NEJUG STD-2]).According to [Caltech Naming Conventions] interface names should end with "able" or the word "Interface". According to [Ambler 6.2.1] interface names should be adjectives, like Runnable, or nouns like DataInput.

[Caltech Naming Conventions] also suggests starting with "Abstract" and ending with "Factory" and "Exception" for the respective class types, and ending with "Impl" for an implementation. [Geosoft 27, 28] specifies starting with "Default" and ending with "Exception" for those types.

[Ambler 1.3] recommends against names longer than 15 characters.

  4.2 Type Comments

A javadoc comment precedes the type statement ([Sun 5.2], [Ambler 1.4.1], [Caltech Structure], [Geosoft 37], [NEJUG STD-6]), According to [Caltech Structure] this should include one @version and at least one @author tag. According to [Ambler 6.1.3] the javadoc comment should include purpose, known bugs, code history, any invariants and concurency issues. For an interface, [Ambler 6.2.2] specifies that the comment include how to use and how not to use the interface.

An implementation comment should follow the type statement, per [Sun 5.2].

// end of class Xxx may follow the closing brace, per [Caltech Structure]. An // end of ... comment should follow every closing brace except those that are within a few lines of their opening mate. Comments are free. Lack of comments can be expensive.

  4.3 Type Format

    4.3.1 Format of the Type Statement

Only [Geosoft 60] specifies the format of the type statement:

type name extends superclassname
  implements interface list
type name
    extends superclassname
    implements interface // single interface

type name
    extends superclassname
    implements
        interface1,
        interface2,
        ...

    4.3.2 Items Within the Type

Within the type, the order is:
  1. javadoc comment
  2. class or interface statement
  3. type implementation block comment if necessary
  4. static variables (public, then protected, then package, then private)
  5. instance variables (same order)
  6. constructors
  7. other methods
This is agreed by [Sun 3.1.3], [Geosoft 37] and [NEJUG STY-4,5].
Agreed as far as it goes, but it doesn't go far enough.
  • constants precede all variables
  • main() precedes constructors (it may contain test code and be commented out)
  • methods in same visibility order as variables
  • getters and setters precede other publics
  • completely uninteresting methods, such as toString, at end
  • everything in alphabetical order within each category

No convention mentions the main() method. This is a problem for me as I almost always write a main() for testing and comment it out after testing is completed. Once commented out, it is invisible to my IDE so I cannot navigate to it — or I couldn't if I didn't know that it immediately precedes the first constructor. Note that this editorial comment was inserted in the original. I thought it important enough to break my own rules, and have not changed my opinion.

    4.3.3 Order of Methods

The order for methods is definitely not agreed. It is one of:
  • Grouped by functionality, not scope or access — [Sun 3.1.3]
  • Methods alphabetically after constructors (unless functional grouping is documented) — [NEJUG STY-5]
  • Constructors, finalize(), then in inverse visibility — public, protected, package, private (data and methods combined); statics may precede non-statics within each group — [Ambler 6.1.4.2]
  • Constructors, finalizers, initializers, statics, publics with private helpers, Static initializers go between static and instance fields. Keep statics together, instance together and offset them by comments [Caltech Structure].
See my comment, 4.3.2.

After deciding on an order, then you can decide to separate your methods one of these ways: