Monday, September 30, 2013

My all-time favorite Stack Overflow question/answer

I spend quote a bit of time in Stack Overflow. Not only to answer questions (which I'm not very good at), but also to learn from great answers.

My all-time favorite question/answer is this one, about branch prediction

The question is deceptively simple. There are barely five paragraphs of text and the sample code. 

The first answer, the accepted one, is my all-time favorite answer. 

It is great in at least two-levels:


  • It shows what the CPU is doing behind the scenes. It shows that the CPU is not just slave machine, doing exactly what the lines of code are asking for.

  • It shows what the compilers do behind the scenes, how different optimization levels affect the code, how different compilers behave significantly different from each other. It shows that the compiler is also not a slave translator, creating object code straight out of the code.


For readers with a software-only background, it opens the eyes to the CPU and how much the input to a program affects the performance. 

For readers who skipped (or never took) the compiler class, it opens the eye to the compiler and how it is not a one-to-one translator of source code to object code.

Finally, the comments in the question and in the accepted answer are a great exchange of ideas, including the effects of caching, L1 and L2 caches and many other topics to investigate and learn more.

And the final drop in the bucket of "greatness" is from one of the comments:


One of the few questions ever where the question links to Wikipedia and the Wikipedia entry links to this question. 

The Wikipedia entry is here. The question is in the External Links section and the answer refers to the Wikipedia entry in the text. Talk about circular reference...

And this obfuscated code question is my second favorite question/answer. The question is an exercise on problem solving: how to methodically untangle the question, bit by bit, until it makes sense. It is also incredible that the obfuscated code actually runs.

Tuesday, July 9, 2013

Google Play: see reviews from other countries

A problem I had with the Android app we work on: how to check what users from other countries are saying about it? More precisely: how to check what users from other countries see when they go to Google Play?

Google Play shows scores (the stars) from all countries, but not all reviews.

An undocumented (to my knowledge) trick is to force Google Play to use a specific language.

For example, click here to check out what Portuguese speakers see when they go to the Angry Birds app.

The trick is to specify the language in the URL: https://play.google.com/store/apps/details?id=com.rovio.angrybirds&hl=pt.

Angry Birds in a few more languages:



Google Play uses the standard language codes from ISO 639-2.

What about the Apple Store?

Other than logging in to the developer account, I couldn't find an easy to see what users from other countries are saying.

It is not exactly what they see they are looking at the app from another country, but the information is there.

If you know of a way to see what other countries see, please let me know.

Thursday, June 6, 2013

How to give a killer presentation

This comes from this Harvard Business Review post.

The piece that caught my eye was this part of the article:

He was painfully shy. His English was halting. When he tried to describe his invention, the sentences tumbled out incoherently.
The quote above describes several software developers I know, downright to the (for other, "normal" people) halting English.

The story is about a twelve-year-old Masai boy, Richard, who gave a TED speech about his invention to protect cows from nightly attacks from lions. He is the painfully shy "he" quoted above who ended up presenting at TED (over a thousand people in the audience, in an imposing setting).

The article is worth reading even if you are not giving presentations to such large audiences because most large software products today are (as expected) developed by large groups.

If you want to lead the group into some direction (for example, convince everyone to use Gerrit for code reviews), you will need presentation skills. 

Perhaps not at the level of a TED presentation, but it won't hurt to get tips for the worst case scenario.







Guru of The Week (GoTW) - Herb Sutter is back

Herb Sutter is active again with his GoTW columns. He has been for a few weeks now.

The latest one is about using unique_ptr to clearly define ownership of pointers. See it here. From there you can jump to his main page and explore other columns.

unique_ptr replaces auto_ptr. This latest column is a great introduction to it, if you grew up with auto_ptr (as I did) and need to understand why it is better to move on to unique_ptr now.

If you never heard of Herb Sutter before, a good place to start is his C++ Coding Standards book. It's a gentle introduction to good C++ practices.

From there you can move on to Exceptional C++ and More Exceptional C++ books. They are denser reads than the Coding Standards book, but well worth the time. They are nicely structured around relatively short chapters. Some of the chapters build on top of the previous one, so it makes more sense to read the chapters in the sequence they are presented in the book.

I learned a lot about C++ with these books. One of the best investments of time I did for C++.





Wednesday, May 29, 2013

The futility of catching std::bad_alloc

After debating the merits of testing for NULL after calling the operator new (see this post), the first reaction is to start coding like this:

   try {
      char* p = new char[256];
      ...many lines of code
   }
   catch( std::bad_alloc& b ) {
      // error handling
   }

For most applications out there, this is just clutter. It makes the code more difficult to read without providing any benefit to justify the added lines of code (disclaimer: I understand that for some applications careful memory management is important - on the other hand, these type of applications will likely not use the global operator new).

It will also make the code bigger. Everything inside the catch block will generate object code, but will rarely, if ever, be executed. 

And, if it is executed, what exactly are we supposed to do inside the catch block?

It will rarely, if ever, be executed

Modern operating systems will allocate lots of virtual memory to a process before memory allocation fails. This is a test program that asks for 4 GB of memory (the size of physical memory on the machine I tested it). 
Side note: The strcpy in the code is just to use the variable p for something. Without that the compiler may optimize away the (unused) variable, which then skips the memory allocation altogether.
 #include <new>  
 #include <iostream>  
 #include <string.h>  
   
 using namespace std;  
   
 int main(int argc, const char* argv[]) {  
   for( int i = 0; i < 4000; i++ ) {  
    try {  
      cout << "Allocated " << i+1 << " MB so far" << endl;  
      char* p = new char[1024 * 1024];  
      strcpy( p, "test" );  
    } catch (std::bad_alloc& b) {  
      cout << "Finally failed" << endl;  
      break;  
    }  
   }  
   
   cout << "Press enter to exit";  
   cin.ignore();  
 }  


And here is the result: the operating system (OS X 10.10 in this case) gladly gave 4 GB of memory to the program without ever throwing std::bad_alloc:

   ...
   Allocated 3998 MB so far
   Allocated 3999 MB so far
   Allocated 4000 MB so far

   Press enter to exit

And the operating system reports most of the memory as virtual memory.



Once a process gets to that point, paging will be a serious problem. The process will run slowly, to the point of being useless (and very likely affect other processes on that machine).

The exact behavior varies by operating system and the configuration of the machine. This is what happens on Ubuntu running under VirtualBox, configured with 2 GB of base memory:

   ...
   Allocated 3053 MB so far
   Allocated 3054 MB so far
   Allocated 3055 MB so far
   Finally failed
   Press enter to exit

Although in this case the operating system did eventually return an error, the point is the same: it will give gobs of memory to the process before it fails.

What exactly are we supposed to do inside the catch block?

If we do eventually get a std::bad_alloc, what exactly can or should do inside the catch block?

The first answer is usually to log something and return an error:

   catch( std::bad_alloc& b ) {
      ...log error
      return ERR_FAILED_TO_ALLOCATE_MEMORY;
   }

There are a few problems with that:


  • If logging an error requires memory (e.g. allocate buffers for the logging system), it will also fail and throw std::bad_alloc.
  • What exactly is the caller function supposed to do with ERR_FAILED_TO_ALLOCATE_MEMORY? We are just passing the buck up the chain.


It is hard to find anything useful or guaranteed to work to put inside the catch block.

In most cases, it is a better alternative to let the program die and restart it in a clean state. Everything else will most likely result in a program that is limping along, paging most of the time, instead of doing useful work.

What about a new-handler function?

This is the point where a well-informed C++ programmer will ask if we can do anything useful by installing a new-handler.

But it again begs the question: what is "useful" under these circumstances. The new-handler can release some memory back to the system to allow the new operator to work.

There are again problems with that:

  • Where does the memory come from? It is probably from a stash the program saved earlier on. The stash cannot be all that big or it will just make the problem happen sooner.
  • How does it is (really) solve the problem? If we have a real leak, release a stash of memory we saved earlier will just allow the program to run a little longer and fail yet again.
If there is anything useful to be done in a new-handler is to nicely document the problem we had so we can more easily debug it later.

This usually means two things:
  • Log a clear error message. Logging the message itself may require some memory, so we need to save a small amount of memory and release it later to ensure our logging functions work.
  • Generate a core dump file for systems that support it. This one can be tricky because the core dump will be very large (it has the entire memory footprint for the process, which by now must be large, since we just ran out of memory).
Here is a sample program that installs a new-handler, reserving a bit of memory to ensure the log works and generates a core dump by calling abort():

 #include <new>  
 #include <iostream>  
 #include <string.h>  
 #include <stdlib.h>  
   
 using namespace std;  
   
 static char* p = new char[1024*10];  
 void new_failed() {  
   // Make sure we have some memory to work with in this function  
   delete[] p;  
   // Log a clear message about the problem we hit  
   cout << "Ran out of memory" << endl;  
   // Will generate a core dump in some systems - useful to debug later  
   // ...but be careful because it will likely be a huge file because of  
   // the amount of memory allocated so far  
   abort();  
 }  
   
 int main(int argc, const char* argv[]) {  
   
   set_new_handler(new_failed);  
   
   for( int i = 0; i < 4000; i++ ) {  
    try {  
      cout << "Allocated " << i+1 << " MB so far" << endl;  
      char* p = new char[1024 * 1024];  
      strcpy( p, "test" );  
    } catch (std::bad_alloc& b) {  
      cout << "Finally failed" << endl;  
      break;  
    }  
   }  
   
   cout << "Press enter to exit";  
   cin.ignore();  
 }  
   


This is what happens when it fails:

   ...
   Allocated 3053 MB so far
   Allocated 3054 MB so far
   Allocated 3055 MB so far
   Ran out of memory

   Aborted (core dumped)


Tools for memory leak detection

A process that runs out of memory has a memory leak in 99.999% of the cases.

There are many great tools out there to find memory leaks. My favorite is valgrind (side note: here is how to pronounce the name of that tool).

If you are developing in a Mac with XCode, run you project with ProductProfile and choose the Leaks option and Memory.

The futility of checking for null after calling new

Before reading further: we are talking about the global operator new with default options. The nothrow version will return a null pointer on failure and an overridden new operator may do pretty much as it wishes.

A piece of code that is still easy to find out there:

   char* p = new char[256];
   if( p == NULL ) {
      // error handling
   }

It is "C disguised as C++" code. In C malloc does return null pointer on failure, but all modern C++ compilers do what the C++ standard asks for: throw std::bad_alloc when it fails to allocate memory. The global operator new will never return a null pointer.

These lines...


   if( p == NULL ) {
      // error handling
   }

...are dead code in C++. The if statement simply costs time and block of code after that will never be executed.

This got more attention in a project where we decide to measure code coverage using automated tests. To reach our coverage goal (a certain percentage of the total lines of code), we started removing dead pieces of code wherever we could find them. This one of them. [We also added more automated tests, of course;)]

Once this is understood, the first reaction is to rewrite the code:

   try {
      char* p = new char[256];
   }
   catch( std::bad_alloc& b ) {
      // error handling
   }

This is (for most programs out there) yet another exercise in futility, discussed here.

Wednesday, May 22, 2013

C++: generating assembly code with gcc/g++ for fun and profit

While discussing how to use strncpy in a really safe way with a colleague (see this post), we got stuck in the performance part: is one of the alternatives faster than the other?

More precisely, is this version (the potentially unsafe one) faster because it uses a constant that can be resolved at compile time for sure?

   strncpy( s.name, "test", MAX_NAME_SIZE );

Or is this version equally as fast because sizeof is also resolved at compile time?


   strncpy( s.name, "test", sizeof s.name );

This could be settled by going to the C++ standard documentation, but you have to pay for it.

A free copy of the C99 standard is available here, but it has two problems:


  1. It's not the C++ standard. It should be about the same in this area, but there is no guarantee.
  2. There is still room for interpretation in the text (see below), although later examples in the document show more evidence that it is evaluated at compile time, you really need to dig into it to make a point.

The text from the document:


6.5.3.4 The sizeof and _Alignof operators
... 
2 The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an
integer constant.
After going through all that, we decided to settle the argument in a simpler way:


  • Get the assembly code for both versions
  • If they are the same, then sizeof is indeed evaluated at compile time (for this case, at least) and there is no performance penalty for the safer code.
The challenge then is to get the assembly code from a C/C++ program.

This is the test program:

 #include <string.h>  
   
 const int MAX_NAME_SIZE = 100;  
 struct something {   
   char name[MAX_NAME_SIZE+1];  
 };  
   
 int main( int argc, const char* argv[] ) {   
   something s;  
   
   strncpy( s.name, "test", MAX_NAME_SIZE+1 );   
   s.name[MAX_NAME_SIZE] = '\0';  
   
   strncpy( s.name, "test", sizeof s.name );  
   s.name[sizeof s.name - 1] = '\0';  
 }

Note: unfortunately the recipe below will not work with the clang family (standard in Mac OS and others). Its assembler does not support interleaved listing. The best it can do is the regular listing (more details in this SO question): g++ -g -O2 -S test.cpp >test.s.

When compiled with these options, we get the assembly language combined with the C++ source code into the test.s file:

   g++ -g -O2 -Wa,-aslh test.cpp >test.s

Before going into the results, this is what each option does:


  • -g: Generate symbols. Without this option the assembler will not be able to match the source code lines. Much harder to read the output without this option.
  • -O2: Optimization level. Strictly speaking, not needed for this exercise, but it is always a good idea to build at the same optimization level used in the released code. It can affect the resulting assembly code significantly.
  • -Wa: Pass options to the assembler. This is the hint that the next options, after the comma, should be passed on to the assembler.
  • -aslh: Assembler options. All -a options are for listing. In this case we asked to include the assembly code (-al), include symbols (-as) and, the most important one, include source code (-ah). These options are explained in more details in the assembler documentation. Just run man as to see them.
And now, the results. What follows is a cleaned up version of the assembly listing.

First, the strncpy( ... MAX_NAME_SIZE ) version, showing that the constant is used, as expected:

  34 000e C7442408              movl    $101, 8(%esp)
  35 0016 C7442404              movl    $.LC0, 4(%esp)
  36 001e 891C24                movl    %ebx, (%esp)
  41 0021 65A11400              movl    %gs:20, %eax
  42 0027 8944247C              movl    %eax, 124(%esp)
  43 002b 31C0                  xorl    %eax, %eax
  49 002d E8FCFFFF              call    strncpy

Now the strncpy( ... sizeof s.name ) version, showing that it also uses a constant, settling the argument (same object code, thus same performance):

  55 0032 C7442408              movl    $101, 8(%esp)
  56 003a C7442404              movl    $.LC0, 4(%esp)
  57 0042 891C24                movl    %ebx, (%esp)
  58 0045 E8FCFFFF              call    strncpy





Tuesday, May 21, 2013

Writing unsafe code with strncpy

Buffer overflow is a common source of security vulnerabilities. Just look at the CERT list for current issues.

In C and C++, a common source of buffer overflow is the string manipulations functions like strcat, sprintf and strcpy.

In an effort to write secure code with strcpy, I have seen code like this, using strncpy to prevent the buffer overflow:

In a header file somewhere:


   const int MAX_NAME_SIZE = 100;
   struct something
   {
      char name[MAX_NAME_SIZE+1];
   };

In some other place:

   ...
   something s;
   strncpy( s.name, source, MAX_NAME_SIZE+1 );
   s.name[MAX_NAME_SIZE] = '\0';

This is bad code.

It looks safe, but it is not.

Some day someone will make this change for a perfectly valid reason:

   const int NEW_MAX_NAME_SIZE = 10;
   struct something
   {
      char name[NEW_MAX_NAME_SIZE+1];
   };

If we are lucky, the old constant is deleted, causing a compiler error that forces a review of the code.

If the old constant is left around, we now have a very big bug in the code: despite the use of strncpy, there is a buffer overflow in this code.

The easy way to fix it is to refer to let the compiler calculated the size of the destination field:

   ...
   something s;
   strncpy( s.name, source, sizeof s.name );
   s.name[(sizeof s.name)-1] = '\0';

Wednesday, May 15, 2013

C++ that isn't (how to use std containers and algorithms in place of C constructs)

I like stackoverflow a lot. It saved me more than once. 

Whenever I can, I try to help other people out to pay it forward

When looking for questions tagged C++ (that I fell reasonably comfortable answering), every so often there is one that is asking a question related to data structures and algorithms. 

However, the question is almost entirely written in C code, like this one (paraphrased):


The goal of this exercise is to remove duplicates from an array. 
Ask the user to enter 10 integers and remove the duplicate array elements. For example, given the input 1,2,2,3,3,4,4,4,5,6, the output should be 1,2,3,4,5,6.
The sample (and broken) code in the question was something in the lines of nested loops:
 #include <iostream>  
 using namespace std;  
 int main()  
 {  
   int numbers[10];  
   cout << "Enter 10 integers pressing return after each one\n";  
   for( int i=0; i<10; i++ )  
    cin >> numbers[i];  
   int count=10;  
   for( int i=0; i<=10; i++ )  
   {  
    for( int x=i+1; x<=10; x++ )  
    {  
      if( numbers[x]==numbers[i] )  
      {  
       count--;  
       for( int t=x; t<=count; t++ )  
         numbers[t] = numbers[t+1];  
      }  
    }  
   }  
   cout << endl;  
   cout << " You entered "<< count << " unique numbers: " << endl;  
   for( int i=0; i<count; i++ )  
    cout << numbers[i] << " ";  
   return 0;  
 }  

The usual response is to fix the code.

However, if the question is tagged as C++, a better solution would be to point the person to C++ data structures and algorithms.

The code below solves the problem using std::sort and std::unique (with a little help of std::erase to clean up the vector).

 #include <iostream>  
 #include <vector>  
 using namespace std;  
   
 int main(int argc, const char * argv[])  
 {  
   vector<int> v;  
   
   cout << "Please input 10 integers, hitting return after each one \n";  
   for( int i = 0; i < 10; i++ ) {  
    int num;  
    cin >> num;  
    v.push_back(num);  
   }  
   
   sort( v.begin(), v.end() );  
   v.erase( unique( v.begin(), v.end() ), v.end() );  
   
   cout << endl << " You entered " << v.size() << " unique numbers: " << endl;  
   copy( v.begin(), v.end(), ostream_iterator<int>( cout, " " ) );  
 }  

After I offered that solution, a fellow stackoverflow user offered an even better one using std::set:

 #include <iostream>  
 #include <set>  
 #include <iterator>  
   
 using namespace std;  
   
 int main(int argc, const char* argv[])  
 {  
   set<int> s;  
   
   cout << "Please input 10 integers, hitting return after each one \n";  
   for( int i = 0; i < 10; i++ ) {  
    int num;  
    cin >> num;  
    s.insert(num);  
   }  
   cout << endl << " You entered " << s.size() << " unique numbers: " << endl;  
   copy( s.begin(), s.end(), ostream_iterator<int>( cout, " " ) );  
 }  
   


There are many great books out there about C++ styles and idioms (covering C++ data structures, algorithms and more). 

The ones I like the most:

Accelerated C++: Practical Programming by Example: short and to the point C++ book. If you have time for nothing else when learning C++, go over this book. I guarantee you will know more C++ than the average programmer out there (if my sample is representative). Also great when transitioning from C or Java (and related languages) to C++.

Effective C++: 55 Specific Ways to Improve Your Programs and Designs (3rd Edition): specific techniques to improve C++ programming. They can be read once a day, as individual topics. In about two months, your C++ will be much improved.

The C++ Standard Library: A Tutorial and Reference (2nd Edition): a reference book, useful even in the age of Google. One of the books I use the most when writing code. I had to buy it two or three times because my copies "created legs". I didn't mind. I'm sure it helped my fellow programmers. I'd rather work with well informed colleagues anyway. Learned a lot from this one. 

Sunday, May 12, 2013

Java: using class variables for performance

I often see code like this:

    public Date parse( String s ) throws ParseException  
    {  
       SimpleDateFormat parser = new SimpleDateFormat( "yyyy-MM-dd HH:mm:ss" );  
       Date date = parser.parse( s );  
       return date;  
    }  

Creating an object that never changes, as the SimpleDateFormat object, inside the function is a huge waste of time.

The code would be more efficient if the object was created only once as a data member:


    private final SimpleDateFormat parser_ = new SimpleDateFormat( "yyyy-MM-dd HH:mm:ss" );  
    public Date parse( String s ) throws ParseException  
    {  
       Date date = parser_.parse( s );  
       return date;  
    }  

This is a test program to check how fast the new code is. See some actual numbers on a Mac OS X after the program.


import java.util.Date;
import java.text.ParseException;
import java.text.SimpleDateFormat;


public class test {

   public static class SlowVersion {
      public Date parse( String s ) throws ParseException {
         SimpleDateFormat parser = new SimpleDateFormat( "yyyy-MM-dd HH:mm:ss" );
         Date date = parser.parse( s );
         return date;
      }
   }

   public static class FasterVersion {   
      private final SimpleDateFormat parser_ = new SimpleDateFormat( "yyyy-MM-dd HH:mm:ss" );
      public Date parse( String s ) throws ParseException
      {
         Date date = parser_.parse( s );
         return date;
      }
   }

   public static void main(String[] args) throws ParseException {

      System.out.println( "Tested with JRE " + System.getProperty("java.version") );
      System.out.println( "Running on "+ System.getProperty("os.name") +
            " " + System.getProperty("os.version") );

      long startTime = System.currentTimeMillis();
      SlowVersion slow = new SlowVersion();
      for( int i = 0; i < 1000; i++ )
         slow.parse( "2013-03-14 12:34:56" );
      long endTime = System.currentTimeMillis();
      System.out.println( "Execution time:" + (endTime-startTime) + " ms" );

      startTime = System.currentTimeMillis();
      FasterVersion faster = new FasterVersion();
      for( int i = 0; i < 1000; i++ )
         faster.parse( "2013-03-14 12:34:56" );
      endTime = System.currentTimeMillis();
      System.out.println( "Execution time:" + (endTime-startTime) + " ms" );
   }
}   


These are three runs with that code on a Mac OS X 10.7 with JRE 1.7:


Tested with JRE 1.7.0_21
Runnign on Mac OS X 10.7.5
Execution time:331 ms
Execution time:83 ms

Tested with JRE 1.7.0_21
Runnign on Mac OS X 10.7.5
Execution time:322 ms
Execution time:114 ms

Tested with JRE 1.7.0_21
Runnign on Mac OS X 10.7.5
Execution time:368 ms
Execution time:102 ms


The small change makes the code about three times faster.

It is optimization in the small, but it is a simple one. Several small optimizations may add up in the end.

If you test on a different environment: please post your numbers as comments.  Just out of curiosity, to see how other operating systems and JRE perform.