Sunday, January 23, 2022

New blog

I switched to GitHub Pages.

Most of my projects have some coding bits. Having the projects and the blog on Github simplifies the workflow.


Monday, September 30, 2013

My all-time favorite Stack Overflow question/answer

I spend quote a bit of time in Stack Overflow. Not only to answer questions (which I'm not very good at), but also to learn from great answers.

My all-time favorite question/answer is this one, about branch prediction

The question is deceptively simple. There are barely five paragraphs of text and the sample code. 

The first answer, the accepted one, is my all-time favorite answer. 

It is great in at least two-levels:


  • It shows what the CPU is doing behind the scenes. It shows that the CPU is not just slave machine, doing exactly what the lines of code are asking for.

  • It shows what the compilers do behind the scenes, how different optimization levels affect the code, how different compilers behave significantly different from each other. It shows that the compiler is also not a slave translator, creating object code straight out of the code.


For readers with a software-only background, it opens the eyes to the CPU and how much the input to a program affects the performance. 

For readers who skipped (or never took) the compiler class, it opens the eye to the compiler and how it is not a one-to-one translator of source code to object code.

Finally, the comments in the question and in the accepted answer are a great exchange of ideas, including the effects of caching, L1 and L2 caches and many other topics to investigate and learn more.

And the final drop in the bucket of "greatness" is from one of the comments:


One of the few questions ever where the question links to Wikipedia and the Wikipedia entry links to this question. 

The Wikipedia entry is here. The question is in the External Links section and the answer refers to the Wikipedia entry in the text. Talk about circular reference...

And this obfuscated code question is my second favorite question/answer. The question is an exercise on problem solving: how to methodically untangle the question, bit by bit, until it makes sense. It is also incredible that the obfuscated code actually runs.

Tuesday, July 9, 2013

Google Play: see reviews from other countries

A problem I had with the Android app we work on: how to check what users from other countries are saying about it? More precisely: how to check what users from other countries see when they go to Google Play?

Google Play shows scores (the stars) from all countries, but not all reviews.

An undocumented (to my knowledge) trick is to force Google Play to use a specific language.

For example, click here to check out what Portuguese speakers see when they go to the Angry Birds app.

The trick is to specify the language in the URL: https://play.google.com/store/apps/details?id=com.rovio.angrybirds&hl=pt.

Angry Birds in a few more languages:



Google Play uses the standard language codes from ISO 639-2.

What about the Apple Store?

Other than logging in to the developer account, I couldn't find an easy to see what users from other countries are saying.

It is not exactly what they see they are looking at the app from another country, but the information is there.

If you know of a way to see what other countries see, please let me know.

Thursday, June 6, 2013

How to give a killer presentation

This comes from this Harvard Business Review post.

The piece that caught my eye was this part of the article:

He was painfully shy. His English was halting. When he tried to describe his invention, the sentences tumbled out incoherently.
The quote above describes several software developers I know, downright to the (for other, "normal" people) halting English.

The story is about a twelve-year-old Masai boy, Richard, who gave a TED speech about his invention to protect cows from nightly attacks from lions. He is the painfully shy "he" quoted above who ended up presenting at TED (over a thousand people in the audience, in an imposing setting).

The article is worth reading even if you are not giving presentations to such large audiences because most large software products today are (as expected) developed by large groups.

If you want to lead the group into some direction (for example, convince everyone to use Gerrit for code reviews), you will need presentation skills. 

Perhaps not at the level of a TED presentation, but it won't hurt to get tips for the worst case scenario.







Guru of The Week (GoTW) - Herb Sutter is back

Herb Sutter is active again with his GoTW columns. He has been for a few weeks now.

The latest one is about using unique_ptr to clearly define ownership of pointers. See it here. From there you can jump to his main page and explore other columns.

unique_ptr replaces auto_ptr. This latest column is a great introduction to it, if you grew up with auto_ptr (as I did) and need to understand why it is better to move on to unique_ptr now.

If you never heard of Herb Sutter before, a good place to start is his C++ Coding Standards book. It's a gentle introduction to good C++ practices.

From there you can move on to Exceptional C++ and More Exceptional C++ books. They are denser reads than the Coding Standards book, but well worth the time. They are nicely structured around relatively short chapters. Some of the chapters build on top of the previous one, so it makes more sense to read the chapters in the sequence they are presented in the book.

I learned a lot about C++ with these books. One of the best investments of time I did for C++.





Wednesday, May 29, 2013

The futility of catching std::bad_alloc

After debating the merits of testing for NULL after calling the operator new (see this post), the first reaction is to start coding like this:

   try {
      char* p = new char[256];
      ...many lines of code
   }
   catch( std::bad_alloc& b ) {
      // error handling
   }

For most applications out there, this is just clutter. It makes the code more difficult to read without providing any benefit to justify the added lines of code (disclaimer: I understand that for some applications careful memory management is important - on the other hand, these type of applications will likely not use the global operator new).

It will also make the code bigger. Everything inside the catch block will generate object code, but will rarely, if ever, be executed. 

And, if it is executed, what exactly are we supposed to do inside the catch block?

It will rarely, if ever, be executed

Modern operating systems will allocate lots of virtual memory to a process before memory allocation fails. This is a test program that asks for 4 GB of memory (the size of physical memory on the machine I tested it). 
Side note: The strcpy in the code is just to use the variable p for something. Without that the compiler may optimize away the (unused) variable, which then skips the memory allocation altogether.
 #include <new>  
 #include <iostream>  
 #include <string.h>  
   
 using namespace std;  
   
 int main(int argc, const char* argv[]) {  
   for( int i = 0; i < 4000; i++ ) {  
    try {  
      cout << "Allocated " << i+1 << " MB so far" << endl;  
      char* p = new char[1024 * 1024];  
      strcpy( p, "test" );  
    } catch (std::bad_alloc& b) {  
      cout << "Finally failed" << endl;  
      break;  
    }  
   }  
   
   cout << "Press enter to exit";  
   cin.ignore();  
 }  


And here is the result: the operating system (OS X 10.10 in this case) gladly gave 4 GB of memory to the program without ever throwing std::bad_alloc:

   ...
   Allocated 3998 MB so far
   Allocated 3999 MB so far
   Allocated 4000 MB so far

   Press enter to exit

And the operating system reports most of the memory as virtual memory.



Once a process gets to that point, paging will be a serious problem. The process will run slowly, to the point of being useless (and very likely affect other processes on that machine).

The exact behavior varies by operating system and the configuration of the machine. This is what happens on Ubuntu running under VirtualBox, configured with 2 GB of base memory:

   ...
   Allocated 3053 MB so far
   Allocated 3054 MB so far
   Allocated 3055 MB so far
   Finally failed
   Press enter to exit

Although in this case the operating system did eventually return an error, the point is the same: it will give gobs of memory to the process before it fails.

What exactly are we supposed to do inside the catch block?

If we do eventually get a std::bad_alloc, what exactly can or should do inside the catch block?

The first answer is usually to log something and return an error:

   catch( std::bad_alloc& b ) {
      ...log error
      return ERR_FAILED_TO_ALLOCATE_MEMORY;
   }

There are a few problems with that:


  • If logging an error requires memory (e.g. allocate buffers for the logging system), it will also fail and throw std::bad_alloc.
  • What exactly is the caller function supposed to do with ERR_FAILED_TO_ALLOCATE_MEMORY? We are just passing the buck up the chain.


It is hard to find anything useful or guaranteed to work to put inside the catch block.

In most cases, it is a better alternative to let the program die and restart it in a clean state. Everything else will most likely result in a program that is limping along, paging most of the time, instead of doing useful work.

What about a new-handler function?

This is the point where a well-informed C++ programmer will ask if we can do anything useful by installing a new-handler.

But it again begs the question: what is "useful" under these circumstances. The new-handler can release some memory back to the system to allow the new operator to work.

There are again problems with that:

  • Where does the memory come from? It is probably from a stash the program saved earlier on. The stash cannot be all that big or it will just make the problem happen sooner.
  • How does it is (really) solve the problem? If we have a real leak, release a stash of memory we saved earlier will just allow the program to run a little longer and fail yet again.
If there is anything useful to be done in a new-handler is to nicely document the problem we had so we can more easily debug it later.

This usually means two things:
  • Log a clear error message. Logging the message itself may require some memory, so we need to save a small amount of memory and release it later to ensure our logging functions work.
  • Generate a core dump file for systems that support it. This one can be tricky because the core dump will be very large (it has the entire memory footprint for the process, which by now must be large, since we just ran out of memory).
Here is a sample program that installs a new-handler, reserving a bit of memory to ensure the log works and generates a core dump by calling abort():

 #include <new>  
 #include <iostream>  
 #include <string.h>  
 #include <stdlib.h>  
   
 using namespace std;  
   
 static char* p = new char[1024*10];  
 void new_failed() {  
   // Make sure we have some memory to work with in this function  
   delete[] p;  
   // Log a clear message about the problem we hit  
   cout << "Ran out of memory" << endl;  
   // Will generate a core dump in some systems - useful to debug later  
   // ...but be careful because it will likely be a huge file because of  
   // the amount of memory allocated so far  
   abort();  
 }  
   
 int main(int argc, const char* argv[]) {  
   
   set_new_handler(new_failed);  
   
   for( int i = 0; i < 4000; i++ ) {  
    try {  
      cout << "Allocated " << i+1 << " MB so far" << endl;  
      char* p = new char[1024 * 1024];  
      strcpy( p, "test" );  
    } catch (std::bad_alloc& b) {  
      cout << "Finally failed" << endl;  
      break;  
    }  
   }  
   
   cout << "Press enter to exit";  
   cin.ignore();  
 }  
   


This is what happens when it fails:

   ...
   Allocated 3053 MB so far
   Allocated 3054 MB so far
   Allocated 3055 MB so far
   Ran out of memory

   Aborted (core dumped)


Tools for memory leak detection

A process that runs out of memory has a memory leak in 99.999% of the cases.

There are many great tools out there to find memory leaks. My favorite is valgrind (side note: here is how to pronounce the name of that tool).

If you are developing in a Mac with XCode, run you project with ProductProfile and choose the Leaks option and Memory.

The futility of checking for null after calling new

Before reading further: we are talking about the global operator new with default options. The nothrow version will return a null pointer on failure and an overridden new operator may do pretty much as it wishes.

A piece of code that is still easy to find out there:

   char* p = new char[256];
   if( p == NULL ) {
      // error handling
   }

It is "C disguised as C++" code. In C malloc does return null pointer on failure, but all modern C++ compilers do what the C++ standard asks for: throw std::bad_alloc when it fails to allocate memory. The global operator new will never return a null pointer.

These lines...


   if( p == NULL ) {
      // error handling
   }

...are dead code in C++. The if statement simply costs time and block of code after that will never be executed.

This got more attention in a project where we decide to measure code coverage using automated tests. To reach our coverage goal (a certain percentage of the total lines of code), we started removing dead pieces of code wherever we could find them. This one of them. [We also added more automated tests, of course;)]

Once this is understood, the first reaction is to rewrite the code:

   try {
      char* p = new char[256];
   }
   catch( std::bad_alloc& b ) {
      // error handling
   }

This is (for most programs out there) yet another exercise in futility, discussed here.