Christian Garbin: May 2013

Wednesday, May 29, 2013

The futility of catching std::bad_alloc

After debating the merits of testing for NULL after calling the operator new (see this post), the first reaction is to start coding like this:

try {
char* p = new char[256];
...many lines of code
}
catch( std::bad_alloc& b ) {
// error handling
}

For most applications out there, this is just clutter. It makes the code more difficult to read without providing any benefit to justify the added lines of code (disclaimer: I understand that for some applications careful memory management is important - on the other hand, these type of applications will likely not use the global operator new).

It will also make the code bigger. Everything inside the catch block will generate object code, but will rarely, if ever, be executed.

And, if it is executed, what exactly are we supposed to do inside the catch block?

It will rarely, if ever, be executed

Modern operating systems will allocate lots of virtual memory to a process before memory allocation fails. This is a test program that asks for 4 GB of memory (the size of physical memory on the machine I tested it).

Side note: The strcpy in the code is just to use the variable p for something. Without that the compiler may optimize away the (unused) variable, which then skips the memory allocation altogether.

 #include <new>  
 #include <iostream>  
 #include <string.h>  
   
 using namespace std;  
   
 int main(int argc, const char* argv[]) {  
   for( int i = 0; i < 4000; i++ ) {  
    try {  
      cout << "Allocated " << i+1 << " MB so far" << endl;  
      char* p = new char[1024 * 1024];  
      strcpy( p, "test" );  
    } catch (std::bad_alloc& b) {  
      cout << "Finally failed" << endl;  
      break;  
    }  
   }  
   
   cout << "Press enter to exit";  
   cin.ignore();  
 }

And here is the result: the operating system (OS X 10.10 in this case) gladly gave 4 GB of memory to the program without ever throwing std::bad_alloc:

...

Allocated 3998 MB so far

Allocated 3999 MB so far

Allocated 4000 MB so far

Press enter to exit

And the operating system reports most of the memory as virtual memory.

Once a process gets to that point, paging will be a serious problem. The process will run slowly, to the point of being useless (and very likely affect other processes on that machine).

The exact behavior varies by operating system and the configuration of the machine. This is what happens on Ubuntu running under VirtualBox, configured with 2 GB of base memory:

...

Allocated 3053 MB so far

Allocated 3054 MB so far

Allocated 3055 MB so far

Finally failed

Press enter to exit

Although in this case the operating system did eventually return an error, the point is the same: it will give gobs of memory to the process before it fails.

What exactly are we supposed to do inside the catch block?

If we do eventually get a std::bad_alloc, what exactly can or should do inside the catch block?

The first answer is usually to log something and return an error:

catch( std::bad_alloc& b ) {
...log error
return ERR_FAILED_TO_ALLOCATE_MEMORY;
}

There are a few problems with that:

If logging an error requires memory (e.g. allocate buffers for the logging system), it will also fail and throw std::bad_alloc.

What exactly is the caller function supposed to do with ERR_FAILED_TO_ALLOCATE_MEMORY? We are just passing the buck up the chain.

It is hard to find anything useful or guaranteed to work to put inside the catch block.

In most cases, it is a better alternative to let the program die and restart it in a clean state. Everything else will most likely result in a program that is limping along, paging most of the time, instead of doing useful work.

What about a new-handler function?

This is the point where a well-informed C++ programmer will ask if we can do anything useful by installing a new-handler.

But it again begs the question: what is "useful" under these circumstances. The new-handler can release some memory back to the system to allow the new operator to work.

There are again problems with that:

Where does the memory come from? It is probably from a stash the program saved earlier on. The stash cannot be all that big or it will just make the problem happen sooner.

How does it is (really) solve the problem? If we have a real leak, release a stash of memory we saved earlier will just allow the program to run a little longer and fail yet again.

If there is anything useful to be done in a new-handler is to nicely document the problem we had so we can more easily debug it later.

This usually means two things:

Log a clear error message. Logging the message itself may require some memory, so we need to save a small amount of memory and release it later to ensure our logging functions work.

Generate a core dump file for systems that support it. This one can be tricky because the core dump will be very large (it has the entire memory footprint for the process, which by now must be large, since we just ran out of memory).

Here is a sample program that installs a new-handler, reserving a bit of memory to ensure the log works and generates a core dump by calling abort():

 #include <new>  
 #include <iostream>  
 #include <string.h>  
 #include <stdlib.h>  
   
 using namespace std;  
   
 static char* p = new char[1024*10];  
 void new_failed() {  
   // Make sure we have some memory to work with in this function  
   delete[] p;  
   // Log a clear message about the problem we hit  
   cout << "Ran out of memory" << endl;  
   // Will generate a core dump in some systems - useful to debug later  
   // ...but be careful because it will likely be a huge file because of  
   // the amount of memory allocated so far  
   abort();  
 }  
   
 int main(int argc, const char* argv[]) {  
   
   set_new_handler(new_failed);  
   
   for( int i = 0; i < 4000; i++ ) {  
    try {  
      cout << "Allocated " << i+1 << " MB so far" << endl;  
      char* p = new char[1024 * 1024];  
      strcpy( p, "test" );  
    } catch (std::bad_alloc& b) {  
      cout << "Finally failed" << endl;  
      break;  
    }  
   }  
   
   cout << "Press enter to exit";  
   cin.ignore();  
 }

This is what happens when it fails:

...
Allocated 3053 MB so far
Allocated 3054 MB so far
Allocated 3055 MB so far
Ran out of memory

Aborted (core dumped)

Tools for memory leak detection

A process that runs out of memory has a memory leak in 99.999% of the cases.

There are many great tools out there to find memory leaks. My favorite is valgrind (side note: here is how to pronounce the name of that tool).

If you are developing in a Mac with XCode, run you project with Product → Profile and choose the Leaks option and Memory.

The futility of checking for null after calling new

Before reading further: we are talking about the global operator new with default options. The nothrow version will return a null pointer on failure and an overridden new operator may do pretty much as it wishes.

A piece of code that is still easy to find out there:

char* p = new char[256];
if( p == NULL ) {
// error handling
}

It is "C disguised as C++" code. In C malloc does return null pointer on failure, but all modern C++ compilers do what the C++ standard asks for: throw std::bad_alloc when it fails to allocate memory. The global operator new will never return a null pointer.

These lines...

if( p == NULL ) {
// error handling
}

...are dead code in C++. The if statement simply costs time and block of code after that will never be executed.

This got more attention in a project where we decide to measure code coverage using automated tests. To reach our coverage goal (a certain percentage of the total lines of code), we started removing dead pieces of code wherever we could find them. This one of them. [We also added more automated tests, of course;)]

Once this is understood, the first reaction is to rewrite the code:

try {
char* p = new char[256];
}
catch( std::bad_alloc& b ) {
// error handling
}

This is (for most programs out there) yet another exercise in futility, discussed here.

Wednesday, May 22, 2013

C++: generating assembly code with gcc/g++ for fun and profit

While discussing how to use strncpy in a really safe way with a colleague (see this post), we got stuck in the performance part: is one of the alternatives faster than the other?

More precisely, is this version (the potentially unsafe one) faster because it uses a constant that can be resolved at compile time for sure?

strncpy( s.name, "test", MAX_NAME_SIZE );

Or is this version equally as fast because sizeof is also resolved at compile time?

strncpy( s.name, "test", sizeof s.name );

This could be settled by going to the C++ standard documentation, but you have to pay for it.

A free copy of the C99 standard is available here, but it has two problems:

It's not the C++ standard. It should be about the same in this area, but there is no guarantee.
There is still room for interpretation in the text (see below), although later examples in the document show more evidence that it is evaluated at compile time, you really need to dig into it to make a point.

The text from the document:

6.5.3.4 The sizeof and _Alignof operators
...

2 The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an
integer constant.

After going through all that, we decided to settle the argument in a simpler way:

Get the assembly code for both versions
If they are the same, then sizeof is indeed evaluated at compile time (for this case, at least) and there is no performance penalty for the safer code.

The challenge then is to get the assembly code from a C/C++ program.

This is the test program:

 #include <string.h>  
   
 const int MAX_NAME_SIZE = 100;  
 struct something {   
   char name[MAX_NAME_SIZE+1];  
 };  
   
 int main( int argc, const char* argv[] ) {   
   something s;  
   
   strncpy( s.name, "test", MAX_NAME_SIZE+1 );   
   s.name[MAX_NAME_SIZE] = '\0';  
   
   strncpy( s.name, "test", sizeof s.name );  
   s.name[sizeof s.name - 1] = '\0';  
 }

Note: unfortunately the recipe below will not work with the clang family (standard in Mac OS and others). Its assembler does not support interleaved listing. The best it can do is the regular listing (more details in this SO question): g++ -g -O2 -S test.cpp >test.s.

When compiled with these options, we get the assembly language combined with the C++ source code into the test.s file:

g++ -g -O2 -Wa,-aslh test.cpp >test.s

Before going into the results, this is what each option does:

-g: Generate symbols. Without this option the assembler will not be able to match the source code lines. Much harder to read the output without this option.

-O2: Optimization level. Strictly speaking, not needed for this exercise, but it is always a good idea to build at the same optimization level used in the released code. It can affect the resulting assembly code significantly.

-Wa: Pass options to the assembler. This is the hint that the next options, after the comma, should be passed on to the assembler.

-aslh: Assembler options. All -a options are for listing. In this case we asked to include the assembly code (-al), include symbols (-as) and, the most important one, include source code (-ah). These options are explained in more details in the assembler documentation. Just run man as to see them.

And now, the results. What follows is a cleaned up version of the assembly listing.

First, the strncpy( ... MAX_NAME_SIZE ) version, showing that the constant is used, as expected:

34 000e C7442408 movl $101, 8(%esp)

35 0016 C7442404 movl $.LC0, 4(%esp)

36 001e 891C24 movl %ebx, (%esp)

41 0021 65A11400 movl %gs:20, %eax

42 0027 8944247C movl %eax, 124(%esp)

43 002b 31C0 xorl %eax, %eax

49 002d E8FCFFFF call strncpy

Now the strncpy( ... sizeof s.name ) version, showing that it also uses a constant, settling the argument (same object code, thus same performance):

55 0032 C7442408 movl $101, 8(%esp)

56 003a C7442404 movl $.LC0, 4(%esp)

57 0042 891C24 movl %ebx, (%esp)

58 0045 E8FCFFFF call strncpy

Tuesday, May 21, 2013

Writing unsafe code with strncpy

Buffer overflow is a common source of security vulnerabilities. Just look at the CERT list for current issues.

In C and C++, a common source of buffer overflow is the string manipulations functions like strcat, sprintf and strcpy.

In an effort to write secure code with strcpy, I have seen code like this, using strncpy to prevent the buffer overflow:

In a header file somewhere:

const int MAX_NAME_SIZE = 100;

struct something

{

char name[MAX_NAME_SIZE+1];

};

In some other place:

...

something s;

strncpy( s.name, source, MAX_NAME_SIZE+1 );

s.name[MAX_NAME_SIZE] = '\0';

This is bad code.

It looks safe, but it is not.

Some day someone will make this change for a perfectly valid reason:

const int NEW_MAX_NAME_SIZE = 10;

struct something

{

char name[NEW_MAX_NAME_SIZE+1];

};

If we are lucky, the old constant is deleted, causing a compiler error that forces a review of the code.

If the old constant is left around, we now have a very big bug in the code: despite the use of strncpy, there is a buffer overflow in this code.

The easy way to fix it is to refer to let the compiler calculated the size of the destination field:

...

something s;

strncpy( s.name, source, sizeof s.name );

s.name[(sizeof s.name)-1] = '\0';

Wednesday, May 15, 2013

C++ that isn't (how to use std containers and algorithms in place of C constructs)

I like stackoverflow a lot. It saved me more than once.

Whenever I can, I try to help other people out to pay it forward.

When looking for questions tagged C++ (that I fell reasonably comfortable answering), every so often there is one that is asking a question related to data structures and algorithms.

However, the question is almost entirely written in C code, like this one (paraphrased):

The goal of this exercise is to remove duplicates from an array.

Ask the user to enter 10 integers and remove the duplicate array elements. For example, given the input 1,2,2,3,3,4,4,4,5,6, the output should be 1,2,3,4,5,6.

The sample (and broken) code in the question was something in the lines of nested loops:

 #include <iostream>  
 using namespace std;  
 int main()  
 {  
   int numbers[10];  
   cout << "Enter 10 integers pressing return after each one\n";  
   for( int i=0; i<10; i++ )  
    cin >> numbers[i];  
   int count=10;  
   for( int i=0; i<=10; i++ )  
   {  
    for( int x=i+1; x<=10; x++ )  
    {  
      if( numbers[x]==numbers[i] )  
      {  
       count--;  
       for( int t=x; t<=count; t++ )  
         numbers[t] = numbers[t+1];  
      }  
    }  
   }  
   cout << endl;  
   cout << " You entered "<< count << " unique numbers: " << endl;  
   for( int i=0; i<count; i++ )  
    cout << numbers[i] << " ";  
   return 0;  
 }

The usual response is to fix the code.

However, if the question is tagged as C++, a better solution would be to point the person to C++ data structures and algorithms.

The code below solves the problem using std::sort and std::unique (with a little help of std::erase to clean up the vector).

 #include <iostream>  
 #include <vector>  
 using namespace std;  
   
 int main(int argc, const char * argv[])  
 {  
   vector<int> v;  
   
   cout << "Please input 10 integers, hitting return after each one \n";  
   for( int i = 0; i < 10; i++ ) {  
    int num;  
    cin >> num;  
    v.push_back(num);  
   }  
   
   sort( v.begin(), v.end() );  
   v.erase( unique( v.begin(), v.end() ), v.end() );  
   
   cout << endl << " You entered " << v.size() << " unique numbers: " << endl;  
   copy( v.begin(), v.end(), ostream_iterator<int>( cout, " " ) );  
 }

After I offered that solution, a fellow stackoverflow user offered an even better one using std::set:

 #include <iostream>  
 #include <set>  
 #include <iterator>  
   
 using namespace std;  
   
 int main(int argc, const char* argv[])  
 {  
   set<int> s;  
   
   cout << "Please input 10 integers, hitting return after each one \n";  
   for( int i = 0; i < 10; i++ ) {  
    int num;  
    cin >> num;  
    s.insert(num);  
   }  
   cout << endl << " You entered " << s.size() << " unique numbers: " << endl;  
   copy( s.begin(), s.end(), ostream_iterator<int>( cout, " " ) );  
 }

There are many great books out there about C++ styles and idioms (covering C++ data structures, algorithms and more).

The ones I like the most:

Accelerated C++: Practical Programming by Example: short and to the point C++ book. If you have time for nothing else when learning C++, go over this book. I guarantee you will know more C++ than the average programmer out there (if my sample is representative). Also great when transitioning from C or Java (and related languages) to C++.

Effective C++: 55 Specific Ways to Improve Your Programs and Designs (3rd Edition): specific techniques to improve C++ programming. They can be read once a day, as individual topics. In about two months, your C++ will be much improved.

The C++ Standard Library: A Tutorial and Reference (2nd Edition): a reference book, useful even in the age of Google. One of the books I use the most when writing code. I had to buy it two or three times because my copies "created legs". I didn't mind. I'm sure it helped my fellow programmers. I'd rather work with well informed colleagues anyway. Learned a lot from this one.

Sunday, May 12, 2013

Java: using class variables for performance

I often see code like this:

    public Date parse( String s ) throws ParseException  
    {  
       SimpleDateFormat parser = new SimpleDateFormat( "yyyy-MM-dd HH:mm:ss" );  
       Date date = parser.parse( s );  
       return date;  
    }

Creating an object that never changes, as the SimpleDateFormat object, inside the function is a huge waste of time.

The code would be more efficient if the object was created only once as a data member:

    private final SimpleDateFormat parser_ = new SimpleDateFormat( "yyyy-MM-dd HH:mm:ss" );  
    public Date parse( String s ) throws ParseException  
    {  
       Date date = parser_.parse( s );  
       return date;  
    }

This is a test program to check how fast the new code is. See some actual numbers on a Mac OS X after the program.

import java.util.Date;
import java.text.ParseException;
import java.text.SimpleDateFormat;


public class test {

   public static class SlowVersion {
      public Date parse( String s ) throws ParseException {
         SimpleDateFormat parser = new SimpleDateFormat( "yyyy-MM-dd HH:mm:ss" );
         Date date = parser.parse( s );
         return date;
      }
   }

   public static class FasterVersion {   
      private final SimpleDateFormat parser_ = new SimpleDateFormat( "yyyy-MM-dd HH:mm:ss" );
      public Date parse( String s ) throws ParseException
      {
         Date date = parser_.parse( s );
         return date;
      }
   }

   public static void main(String[] args) throws ParseException {

      System.out.println( "Tested with JRE " + System.getProperty("java.version") );
      System.out.println( "Running on "+ System.getProperty("os.name") +
            " " + System.getProperty("os.version") );

      long startTime = System.currentTimeMillis();
      SlowVersion slow = new SlowVersion();
      for( int i = 0; i < 1000; i++ )
         slow.parse( "2013-03-14 12:34:56" );
      long endTime = System.currentTimeMillis();
      System.out.println( "Execution time:" + (endTime-startTime) + " ms" );

      startTime = System.currentTimeMillis();
      FasterVersion faster = new FasterVersion();
      for( int i = 0; i < 1000; i++ )
         faster.parse( "2013-03-14 12:34:56" );
      endTime = System.currentTimeMillis();
      System.out.println( "Execution time:" + (endTime-startTime) + " ms" );
   }
}

These are three runs with that code on a Mac OS X 10.7 with JRE 1.7:

Tested with JRE 1.7.0_21
Runnign on Mac OS X 10.7.5
Execution time:331 ms
Execution time:83 ms

Tested with JRE 1.7.0_21
Runnign on Mac OS X 10.7.5
Execution time:322 ms
Execution time:114 ms

Tested with JRE 1.7.0_21
Runnign on Mac OS X 10.7.5
Execution time:368 ms
Execution time:102 ms

The small change makes the code about three times faster.

It is optimization in the small, but it is a simple one. Several small optimizations may add up in the end.

If you test on a different environment: please post your numbers as comments. Just out of curiosity, to see how other operating systems and JRE perform.

Pages