ISO 9126 1 defines portability as:
A set of attributes that bear on the ability of software to be transferred from one environment to another.
The word "environment" is not defined, but can typically be:
Operating systems are e.g. Mac-OS, NextStep, Solaris, MS-DOS. Hardware platforms are e.g. Motorola 68K, PowerPC, Sparc, ix86. Compiler vendors are e.g. Borland, Microsoft, IBM, Watcom. GUI-systems are e.g. OpenWindows, OSF/Motif, MS Windows, OS2/PM. User languages are e.g. English, Swedish, French. Presentation formats are e.g. how to display time, currency, etc. Other aspects of the word "environment" is communications, databases and different kinds of class libraries.
Portability is an issue to all projects involving multiple "environments". In this chapter we will concentrate on the portability issues close to the C++ language. Other aspects are also relevant, but not within the scope of this book.
Many aspects of C++ are inherently non-portable. They are called either undefined, unspecified or implementation-defined parts of the language. Then there are pure extensions that are supplied by particular compiler vendors. You should try to avoid all extensions to C++, but if they are needed, their use must be localized to a few places in the code.
Rule 15.1 Do not depend on undefined, unspecified or implementation-defined parts of the language.
Rule 15.2 Do not depend on extensions to the language or to the standard library.
Rec 15.3 Make non-portable code easy to find and replace.
Rec 15.14 , unsupported language features must be treated similar to language extensions.
Rule 15.1 Do not depend on undefined, unspecified or implementation-defined parts of the language.
Most non-portable code generally falls into three different categories:
Implementation-defined behavior
Implementation-defined behavior means that the code is completely legal C++, but compilers may interpret it differently. However, for each implementation-defined aspect there are only a few different ways in which compilers may differ, and the compiler vendor is required to say in the documentation what their particular compiler does. For example, it is implementation-defined whether a char object can store a negative value or not.
Implementation-defined behavior
const char c = -100; if (c < 0) // Implementation-defined behavior { // ... }
Unspecified behavior also means that the code is also completely legal C++, but compilers may interpret it differently. The difference between implementation-defined behavior and unspecified behavior is that the compiler vendor is not required to describe what their particular compiler does. For example, when you cast an integer to an enum , the resulting enum value may in some cases be unspecified.
enum BasicAttrType { // ... counterGauge = 0x1000, // 4096 counterPeg = 0x2000, // 8192 conterAcc = 0x3000 // 12288 }; BasicAttrType t = (BasicAttrType) 10000; // t has unspecified value
Undefined behavior means that code is not correct C++. The standard does not specify what a compiler shall do with such code. It may ignore the problem completely, issue an error or something else. For example, it is undefined what happens if you dereference a pointer returned from a request for zero bytes of memory.
char* a = new char[0]; cout << *a << endl; // Undefined behavior
All programs with any ambition of being portable shall of course avoid all dependencies on such parts of the language. The problem is that there are very few programmers on the planet who knows of all these parts of C++. Many portability problems are fortunately so obscure that they seldom give any problems. In the rest of this chapter we will describe the most common ones.
In general you should stay within the areas of the language that you as an individual programmer know well, and take a look in a book or the language specification itself if you are doing something new that is likely to be non-portable.
Rule 15.2 Do not depend on extensions to the language or to the standard library.
Extensions to C++ are sometimes necessary. A fully portable program shall of course not depend on such features, but sometimes, for various reasons, it can be necessary to use such extensions to the language. It can be necessary to use macros if you want to write portable code.
An extension provided by many compilers for DOS and MS-Windows are far and near pointers. By specifying the type of the pointer it is possible to sometimes generate more efficient code for a segmented architecture such as the 80x86-family of processors.
A near pointer is a 16 bit-pointer that can be used to access objects within a 64K segment.
char __near* np;
A far pointer is a 32-bit pointer that can access any available memory area.
char __far* fp; // sizeof(fp) != sizeof(np)
Portable code must have macros to make it possible to remove these non-standard key words when compiling on other platforms.
#ifdef UNIX #define FAR // ... #else #define FAR _far #endif char FAR* fp; // This will now be OK on a UNIX computer
Rec 15.3 Make non-portable code easy to find and replace.
Sometimes you are forced to write non-portable code. The best way out of this is to use such features in a way so that a new definition of a macro or a typedef, or the replacement of a file, makes the code work in the new environment. The general trick is to isolate such code as much as possible so that it is easy to find and replace.
#ifdef INT32 typedef int sint32; #else typedef long sint32; #endif sint32 result = 1234 * 567; // result should
To avoid platform-specific behavior, you must choose a suitable representation for the sint32 typedef. Depending on how large the integral types are, you could e.g. choose between an int or a long .
There are a few non-portable aspects of file inclusion, such as when to write "" or <> , and what can be inside of such include brackets.
Rule 15.4 Headers supplied by the implementation should go in <> brackets; all other headers should go in "" quotes.
Rec 15.5 Do not specify absolute directory names in include directives.
Rec 15.6 Include file names should always be treated as case sensitive.
Rule 2.1 , what to include.
Rule 15.4 Headers supplied by the implementation should go in <> brackets; all other headers should go in "" quotes.
All classes and functions in the C++ standard library requires the inclusion of a header before it can be used. A header is usually a source file, but it does not have to be so. It is recommended to only include standard headers with <> . It is implementation-defined what happens if a name not defined by the standard appears within <> . All non-standard header files should be included with "" quotes to avoid such implementation-defined behavior. Most compilers allow both ways, since other standards, such as for example POSIX, recommend the use of <> for inclusion.
Good and bad way of including files
// Only include standard header with <> #include <iostream.h> /* OK: standard header */ #include <MyFile.hh> /* NO: non-standard header */ // include any header with "" #include "stdlib.h" /* NO: better to use <> */ #include "MyFile.hh" /* OK */
Rec 15.5 Do not specify absolute directory names in include directives.
You should also avoid using directory names in the include directive, since it is implementation-defined how files in such circumstances are found. Most modern compiler allow relative path names with / as separator, because such names has been standardized outside the C++ standard, for example in POSIX. Absolute path names and path names with other separators should always be avoided though.
The file will be searched for in an implementation-defined list of places. Even if one compiler finds this file there is no guarantee that another compiler will. It is better to specify to the build environment where files may be located, since then you do not need to change any include-directives if you switch to another compiler.
Directory names in include directives
#include "inc/MyFile.hh" /* Not recommended */ #include "inc\MyFile.hh" /* Not portable */ #include "/gui/xinterface.h" /* Not portable */ #include "c:\gui\xinterf.h" /* Not portable */
Rec 15.6 Include file names should always be treated as case sensitive.
Some operating systems, such as DOS, Windows NT and Vax-VMS, do not have case-sensitive file names. When writing programs to such operating systems, the programmer can include a file in many different ways.
If you are inconsistent, your code will be difficult to port to an environment with case-sensitive file names. Therefore you should always include a file as if it was case sensitive. You should look at the documentation for the class if you are uncertain.
Case-sensitivity of header file name
// Includes the same file on Windows NT, but not on UNIX. #include <Iostream.h> #include <iostream.h> #include <iostream.H>
The size and layout of objects is implementation-defined in C++ so that compiler vendors can generate code that is as efficient as possible. This is one of the most powerful parts of C++, as well as one of the most error-prone ones. A few rules and recommendations are needed in order to steer clear of portability problems.
Rule 15.7 Do not make assumptions about the size of or layout in memory of an object.
Rule 15.8 Do not cast a pointer to a shorter quantity to a pointer to a longer quantity.
Rec 15.9 If possible, use plain int to store, pass or return integer values.
Rec 15.10 Do not explicitly declare integral types as signed or unsigned .
Rule 15.11 Make sure all conversions of a value of one type to another of a narrower type do not slice off significant data.
Rec 15.12 Use typedefs or classes to hide the representation of application-specific data types.
Rec 6.1 - Rec 6.3 , how to use casts.
Rec 7.3 - Rec 7.5 , how to pass arguments.
Rule 15.7 Do not make assumptions about the size of or layout in memory of an object.
The sizes of built-in types are different in different environments. For example, an int may be 16, 32 or even 64 bits long. The layout of objects is also different in different environments, so it is unwise to make any kind of assumption as to the layout in memory of objects, such as when lumping together different data in a struct.
struct PersonRecord { char ageM; unsigned int phoneNumberM; EmcString nameM; };
A compiler is entitled to significant freedom when laying out such data in memory to find the most efficient solution. The exact address of the ageM , phoneNumberM and nameM data members within an object of type PersonRecord can vary between different environments.
Rule 15.8 Do not cast a pointer to a shorter quantity to a pointer to a longer quantity.
Certain types have alignment requirements. An alignment requirement is a requirement on the addresses of objects. For example, some architectures require that objects of a certain size starts at an even address. It is a fatal error if a pointer to an object of that size points to an odd address. For example, you might have a char pointer and want to convert it to an int pointer. If the pointer points at an address that is illegal for an int , dereferencing the int pointer will give a run-time error.
Cast must obey alignment rules
int stepAndConvert(const char* a, int n) { const char* b = a + n; // step n chars ahead return *(int*) b; // NO: Dangerous cast of const char* to int* }
Calling stepAndConvert() will probably give a run-time error for many combinations of the two parameters (a, n).
const char data[] = "abcdefghijklmnop"; int anInt = 3; int i = stepAndConvert(data, anInt); // NO: May crash
This kind of code is unlikely to work, but if it does, it will certainly not be portable.
Rec 15.9 If possible, use plain int to store, pass or return integer values.
Plain int is the most efficient integral type on most systems, since it has the natural word size suggested by the machine architecture. A rule of thumb is that fewer machine instructions are needed when you have operands that have the natural word size of the processor. There are however exceptions, like the Alpha processor from Digital, which has 32 bits int s, 64 bits long int s and a natural word size of 64 bits. However, in most cases, if you select any other type you should have a good reason.
Selecting a short int instead of a plain int does not make sense unless you are very tight on memory, and a long int should only be used if it will hold values so large that plain ints are not big enough.
Rec 15.10 Do not explicitly declare integral types as signed or unsigned.
It is also best to avoid using explicitly signed or unsigned integral types, since mixing them in expressions may give you non-trivial arithmetic conversions that are tricky to understand.
Mixing signed and unsigned integers
The standard header limits.h defines a number of constants that describe the range of the built-in types, for example INT_MIN , INT_MAX , UINT_MIN and UINT_MAX . If you work with very large numbers, be sure to check against these values.
// Suppose int and unsigned int are 32 bits long. // From a typical limits.h file: // #define INT_MIN -2147483648 // #define INT_MAX 2147483647 // #define UINT_MAX 4294967295 int i = 42; unsigned ui = 2222222242; int j = i - ui; // NO: Result -2222222200 is out of range!!! // j has value: 2072745096 !!!
When subtracting a larger value from a smaller value, the result is implementation-defined if an unsigned type is used. Plain char s are particularly problematic, since it is implementation-defined if they are signed or unsigned .
char s can be signed or unsigned
char zero = 0; char one = 1; char minusOne = zero - one; // NO: result has // implementation- // defined value char result = one + minusOne; // result is not always // equal to zero
Rule 15.11 Make sure all conversions of a value of one type to another of a narrower type do not slice off significant data.
Converting values from a longer to a narrower type is potentially unsafe since significant data may be lost.
Most compilers will warn about dangerous conversions and you should try to rewrite the code if that is necessary to avoid them. You could, for example, use a data type with larger range.
You could also look through your code to see whether such dangerous conversions are possible.
The UNIX system call fork() , which returns a value of a type given by the typedef pid_t . Some systems define pid_t as a short .
// fork() returns pid_t that is sometimes a short short int pid1 = fork(); // NO: should use pid_t
If a typedef is provided, you should always use it instead of the actual type. In this particular case, we should use pid_t .
pid_t pid2 = fork(); // Recommended
Rec 15.12 Use typedefs or classes to hide the representation of application-specific data types.
An application-specific type is used to store a quantity that varies between different environments. By providing a typedef or a class it is possible for the programmer to write more portable code. Such types should only be used when there is a real need for them. Typedefs makes the code more difficult to read, and classes can have negative impact on performance.
A common problem is to use compilers that does not implement all features of the language. By looking forward you can avoid many future problems today.
Rec 15.13 Always prefix global names (such as externally visible classes, functions, variables, constants, typedefs and enums) if namespace is not supported by the compiler.
Rec 15.14 Use macros to prevent usage of unsupported keywords.
Rec 15.15 Do not reuse variables declared inside a for -loop.
Rec 1.4 , names that should be put in namespace s.
Rule 4.1 , how to write a for -loop.
Rec 15.13 Always prefix global names (such as externally visible classes, functions, variables, constants, typedefs and enums) if namespace is not supported by the compiler.
It is possible to avoid name clashes by putting declarations and definitions inside namespaces. Without namespaces, most definitions and declarations will be global. In such cases name clashes are avoided by adding a unique prefix to each global name.
Other solutions, such as putting declarations and definitions inside classes as static members should be avoided unless there is a close relationship between the nested identifier and the class.
EmcString famousClimber = "Edmund Hillary"; // Uses Emc as prefix
Rec 15.14 Use macros to prevent usage of unsupported keywords.
The C++ standard has added many new keywords to the language. The current list contains 63 keywords.
The language also provide textual, alternative representations for some of the operators.
and (&&) |
compl (~) |
or_eq (|=) |
and_eq (&=) |
not (!) |
xor (^) |
bitand (&) |
not_eq (!=) |
xor_eq (^=) |
bitor (|) |
or (||) |
|
None of these names are legal to use as identifiers, but many compilers are not up-to-date with the standard.
Unsupported keyword as empty macro
If your compiler for example does not support the keyword explicit that is used to prevent a constructor from defining an implicit conversion, it is useful to define an empty macro with the same name as the keyword.
#ifdef NO_EXPLICIT #define explicit #endif
By doing so you prevent many future problems that will result from using the keyword incorrectly.
EmcString explicit; // Error: explicit is keyword // will not compile if explicit defined as macro
An additional benefit is that you can use a keyword in places where it is intended to be used.
class EmcArray { public: explicit EmcArray(size_t size); // ... };
The macro does however not work as the keyword will do, since it will not stop the constructor to work as an implicit conversion from the type of the parameter to an object of the type of the class. The macro will only work as a way for the implementor of the class to tell the user that the constructor should not be used for implicit conversions.
Here are some other useful macro-definitions and typedefs:
#ifdef NO_BOOL typedef int bool; const bool false = 0; const bool true = 1; #endif #ifdef NO_MUTABLE #define mutable #endif #ifdef NO_EXCEPTION #define throw(E) abort(); #define try #define catch(T) if (0) #endif
The library standard defines numerous names, that also should be avoided. Most of them will be put inside the namespace std , so the chance of getting into trouble will be less. We do not list all names in the book since the list contains more than 800 names. It is also unlikely that anyone would want to spend time checking that list while reviewing code.
Rec 15.15 Do not reuse variables declared inside a for-loop.
The scope of a variable declared inside a for-statement has been changed by the C++ standard. Previously such a variable belonged to the enclosing scope, but now it belongs to the block following the for -statement. This means that a variable declared in a for -loop can no longer be reused in the enclosing scope. If you want to reuse a loop variable you need to move the declaration outside the for loop.
int i = 0; for(; i < last(); i++) { // ... } for(; i >= first(); i--) { // ... }
Some parts of C++ have never been clearly specified. This is particularly true for templates. Such parts of C++ should be handled with care, since compilers often handle them differently. The best thing to do is to have a design that is as good as possible and code that can be compiled for the platforms chosen. Another solution is to only use compilers that implement templates the same way, or only use one compiler. If that is not possible, you must either restrict yourself to those part of the language that are implemented by all compilers, or try to make your code easy to modify for new platforms.
Rec 15.16 Only inclusion of the header file should be needed when using a template.
Rec 15.17 Do not rely on partial instantiation of templates.
Rec 15.18 Do not rely on the lifetime of temporaries.
Rec 15.19 Do not use pragma s.
Rule 15.20 Always return a value from main() .
Rec 15.21 Do not depend on the order of evaluation of arguments to a function.
Rec 2.5 , how to organize templates.
Rec 7.3 - Rec 7.5 , argument passing.
Rec 15.16 Only inclusion of the header file should be needed when using a template.
How should you organize your templates?
A template has an interface and an implementation just as any class or function. A template is similar to an inline-function. The compiler must be see both the interface and the implementation when code is generated.
A template is automatically instantiated for all template arguments that the program uses. It is also possible to request it to be instantiated for a particular set of arguments. The reason to why you would want such explicit instantiations is to reduce the compile time for your program.
// emcMax is function template template<class T> const T& emcMax(const T& a, const T& b) { return (a > b) ? a : b; } void foo(int i, int j) { int m = emcMax(i, j); // usage of emcMax } EmcQueue<int> q; // usage of class EmcQueue<int> and // EmcQueue<int>:s default constructor q.insert(42); // usage of EmcQueue<int>::insert template class EmcQueue<char>; // Explicit instantiation
There is no standard for how template source code is organized and how much of a template to instantiate for a particular set of arguments.
A function template is used when it is called, or its address is taken. A class template is used when instances of the class template are used to declare objects.
Some compilers require that the implementation either be part of the header file or be included by the header file.
Other compilers use file-name conventions to determine where to find the implementation. The implementation should be in a file with the same name as the header file, but with the implementation file extension substituted for the header file extension.
This is a potential portability problem when writing code using templates. We recommend to always put the implementation in a separate file, a template definition file. By using conditional compilation to control if this file is included or not, the same source code can be used with different compilers .
By having a macro EXTERNAL_TEMPLATE_DEFINITION it is possible, at compile-time, to control whether the implementation file is included by the header file or not.
template <class T> class EmcQueue { // ... }; #ifndef EXTERNAL_TEMPLATE_DEFINITION #include <EmcQueue.cc> #endif
Rec 15.17 Do not rely on partial instantiation of templates.
A difference between compilers that is more difficult to handle is how much of a template class is instantiated.
Some compilers allow a template class to be instantiated for types that does not provide all operators or member functions needed by the implementation.
As long as you do not use the part of the implementation that requires these, no error is reported by these compilers. This is called partial instantiation.
Other compilers instantiate all members of a template class. Therefore, the template argument must support all uses of the type, even if only a few of the member functions are used. The only solution that always works is to avoid relying on partial instantiation; i.e. always assume that all member functions are instantiated.
Rec 15.18 Do not rely on the lifetime of temporaries.
Temporary objects are often created in C++, such as when a function returns a value, or when a parameter to a function is passed by value. The lifetime of temporaries was implementation-defined for a long time, but it has now been decided that they must persist at least until the end of the full expression in which they were created. Unfortunately, it is possible that your compiler still does not implement that behavior. Therefore you should take great care not to depend on the lifetime of temporaries.
Temporary objects are often created when operating upon objects that store values, such as strings. If the class also provides a conversion operator that returns a pointer or reference to the representation, then you have potentially dangerous code.
class DangerousString { public: DangerousString(const char* cp); operator const char*() const; // conversion operator gives access to data member // ... };
The conversion operator to const char* is used to access the representation of the string so that it can be printed by calling ostream::operator<<(const char*) . The problem with this is that the DangerousString object to be printed could be a temporary, for example if it stores the result of an expression. Since the lifetime of those objects vary between implementations, there is a risk that the pointer becomes invalid before it is used.
DangerousString operator+(const DangerousString& left, const DangerousString& right); DangerousString a = "This may go"; DangerousString b = " wrong"; cout << a << endl; // OK cout << a + b << endl; // Dangerous
The solution for avoiding the problem in this particular case is to add an output operator for DangerousString -objects. Since a reference to the temporary is passed to the function, the compiler must guarantee that the object bound to that reference exists until the function returns.
ostream& operator<<(ostream& o, const DangerousString& s);
A pragma is usually a way to control the compilation process, such as disabling optimization of a particular function, or to force an inline function to become inline in cases when the compiler normally would refuse to make it inline.
Everything about pragmas is implementation-defined, so they are perhaps the most non-portable feature of C++. The preprocessor will handle them if it can understand them, and otherwise they will just be ignored. You cannot be completely sure a new compiler will understand any pragmas in your code.
It is only OK to use pragmas as long as your code will work correctly without them. Therefore you should only use them sparingly and always document why and where they are used.
The pragma once was previously provided by the g++ compiler as a way for the programmer to tell the preprocessor which files that are include files. Files with the pragma should only be included once.
#pragma once /* NO: not portable! */
Rule 15.20 Always return a value from main().
The standardization committee for C++ has decided that the return values of functions must always be declared. Functions without return values were previously assumed to return an int . Therefore you now have to declare main to return an int and you should also always return a value. This is good, since in many environments this return value is checked by other programs.
int main() // Yes { // ... return 0; // Yes }
Rec 15.21 Do not depend on the order of evaluation of arguments to a function.
Another area where compilers differ is the order of evaluation of function arguments.
func(f1(), f2(), f3()); // f1 may be evaluated before f2 and f3, // but don't depend on it!
The order of evaluation of expressions that are part of a larger expression, is in many cases also unspecified. A portable program should not depend on any specific order.
Evaluation order of subexpressions
a[i++] = i; // NO: i may be incremented before or // after its value is used on the right // side of the assignment.