Recently, we have discovered an awful BUG affecting nspluginwrapper on some GNU/Linux 64 bit distributions while using Acrobat Reader’s nppdf.so plugin in Iceweasel / Firefox browsers.
Bug details:
As long as a certain PDF document is showed inside a tab or window’s browser using nspluginwrapper (no matter what version) through Acrobat Reader 32 bit plugin 8.X., process “acroread” starts increasing its HEAP size memory consumption until the O.S enters in Trashing mode.
A brief analysis about nspluginwrapper’s code
First of all, as soon as the nspluginwrapper starts its execution thanks to NPP_Initialize NPAPI function call, it hooks all NPP calls to its own routines:
3210 plugin_funcs->size = sizeof(NPPluginFuncs); 3211 plugin_funcs->version = NPW_NPAPI_VERSION; 3212 plugin_funcs->newp = NewNPP_NewProc(g_NPP_New); 3213 plugin_funcs->destroy = NewNPP_DestroyProc(g_NPP_Destroy); 3214 plugin_funcs->setwindow = NewNPP_SetWindowProc(g_NPP_SetWindow); 3215 plugin_funcs->newstream = NewNPP_NewStreamProc(g_NPP_NewStream); 3216 plugin_funcs->destroystream = NewNPP_DestroyStreamProc(g_NPP_DestroyStream); 3217 plugin_funcs->asfile = NewNPP_StreamAsFileProc(g_NPP_StreamAsFile); 3218 plugin_funcs->writeready = NewNPP_WriteReadyProc(g_NPP_WriteReady); 3219 plugin_funcs->write = NewNPP_WriteProc(g_NPP_Write); 3220 plugin_funcs->print = NewNPP_PrintProc(g_NPP_Print); 3221 plugin_funcs->event = NewNPP_HandleEventProc(g_NPP_HandleEvent); 3222 plugin_funcs->urlnotify = NewNPP_URLNotifyProc(g_NPP_URLNotify); 3223 plugin_funcs->javaClass = NULL; 3224 plugin_funcs->getvalue = NewNPP_GetValueProc(g_NPP_GetValue); 3225 plugin_funcs->setvalue = NewNPP_SetValueProc(g_NPP_SetValue); |
Reading NPAPI reference, NPP_Destroy event will be called as long as a new plugin instance has been closed – closing the tab, surfing to another URL location with different MIME type, etc -.
Inside NPP_Detroy event, any plugin must free all allocated memory in order to avoid memory lacks or excessive memory consumption. Thus, reading the sources, we can find out where all memory will be freed: in g_NPP_Destroy.
Finally, and only when there are no more instances running inside the browser, NPP_Shutdown method will be called from the browser, destroying – apart from all allocated memory -, npviewer.bin process, in charge of showing up the entire 32 bit plugin on the 64 bit browser. In particular, thinking about Acrobat Reader, the PDF document by itself:
3351 NPError 3352 NP_Shutdown(void) 3353 { 3354 NPError ret = g_NP_Shutdown(); 3355 3356 npobject_bridge_destroy(); 3357 3358 id_kill(); 3359 3360 return ret; 3361 } |
This issue can be demonstrated using pstree command:
|-firefox-bin-+-npviewer.bin---ld-linux.so.2 |
Here, ld-linux.so is, in fact, the acroread’s pre-loader process. On some distros, so long as lsb libraries are installed, process “acroread” loads perfectly with no pre-loader like ld-linux.so.
The main problem is that, sometimes, in spite of killing npviewer.bin rightly after processing g_NPP_Destroy or g_NPP_Shutdown functions, acroread process starts increasing its .data segment size (because of consecutively calls to brk() system-call), until the entire O.S enters in Trashing mode.
The first approach: Garbage-Inspector
Instead of trying to determine what the hell is going on with acroread process, quite difficult, I preferred to develop a “Garbage-Inspector” process in order to kill acroread process (if needed), avoiding memory lack and system trashing.
To do this, there are two main parts of code: one of them placed just inside nspluginwrapper project (npw-wrapper.c file) and the another one outside, in a shell-script bash file called garbage-inspector.sh.
The first one is in charge of, as long as g_NPP_Destroy is called, run – if needed -, the “Garbage-Inspector” shell script. When a new Plugin * object is referenced and created, its .data segment memory consumption is updated and stored inside a file on /var/log/nsplugin-pids/PID.
Garbage-Inspector iterates all over these PIDs stored on /var/log/nsplugin-pids/ directory, comparing current .data process consumption of all of them with their previous values stored thanks to my nspluginwrapper added code. As soon as any of them become superior than 3data kB, Garbage-Inspector kills ’em and annotates this action in the syslog.
You can figure out how Garbage-Inspector and nspluginwrapper modified code work reading the syslog:
669 Feb 11 16:36:31 taltos firefox-bin: PDF instances: 1 670 Feb 11 16:36:47 taltos firefox-bin: PDF instances: 2 671 Feb 11 16:36:50 taltos firefox-bin: Keeping an eye on PID: 20323 672 Feb 11 16:36:50 taltos firefox-bin: Removing instance, still: 1 673 Feb 11 16:36:59 taltos firefox-bin: PDF instances: 2 674 Feb 11 16:37:02 taltos firefox-bin: Keeping an eye on PID: 20323 675 Feb 11 16:37:02 taltos firefox-bin: Removing instance, still: 1 678 Feb 11 16:39:38 taltos firefox-bin: Keeping an eye on PID: 20323 679 Feb 11 16:39:38 taltos firefox-bin: Killing zombie PID 20323 680 Feb 11 16:39:42 taltos firefox-bin: Removing instance, still: -1 681 Feb 11 16:40:21 taltos firefox-bin: PDF instances: 1 682 Feb 11 16:40:29 taltos firefox-bin: PDF instances: 2 683 Feb 11 16:40:33 taltos firefox-bin: Keeping an eye on PID: 23052 685 Feb 11 16:40:33 taltos firefox-bin: Removing instance, still: 1 686 Feb 11 16:40:33 taltos logger: [Garbage-Inspector]: Starting at Wed Feb 11 16:40:33 CET 2009 687 Feb 11 16:40:39 taltos firefox-bin: PDF instances: 2 688 Feb 11 16:40:45 taltos firefox-bin: Keeping an eye on PID: 23052 689 Feb 11 16:40:45 taltos firefox-bin: Removing instance, still: 1 <strong><span style="color: #ff0000;">690 Feb 11 16:41:49 taltos logger: [Garbage-Inspector]: PID 23052 has been killed using: 60096 k</span><span style="color: #ff0000;">B.</span></strong> (...) 700 Feb 11 16:50:42 taltos logger: [Garbage-Inspector]: Exiting at WedFeb 11 16:50:42 CET 2009 |
How my Garbage-Inspector code looks like
Developed as a Shell-Script in BASH, Garbage-Inspector is based on lockfile , logger and trivial shell-scripting techniques. Its data is received trough /var/log/nsplugin-pids, one file per running acroread PID. As soon as no PIDS remain on this directory, Garbage-Inspector process finish its execution. As long as it’s running, there’s no much memory or processor consumption because of its own design. Here you have it:
PIDSDIR="/var/log/nsplugin-pids" ITIME=600 # Time in seconds while garbage will be running. LOCKFILE="$PIDSDIR/garbage-inspector.lock" # Disable CTRL+C trap '' 2 # # Avoid running more than one instance. Check for the # lock file, if exists, sleep 8 seconds and retry it again ... and again ... # lockfile -8 -r -1 $LOCKFILE logger -p daemon.info "[Garbage-Inspector]: Starting at `date`" # # Now iterate for ITIME seconds in the system, after this time # exit and remove lockfile # cursec=0 while [ $cursec -le $ITIME ]; do echo "Second: $cursec" # # Iterate inside any PID file, open it up and get its VmData "normal" # consumption # PIDS=`ls $PIDSDIR` for ipid in $PIDS; do # # Get number of PIDS inside the dir: # NPIDS=`ls $PIDSDIR|wc -l` if [ $NPIDS -le 1 ]; then # There are no PIDS, so exiting ... logger -p daemon.info "[Garbage-Inspector]: Exiting (no PIDS) at `date`" rm -rf $LOCKFILE exit 0 fi if [ "$PIDSDIR/$ipid" != "$LOCKFILE" ]; then echo -ne "Checking HEAP for PID $ipid ...: " heap=`cat $PIDSDIR/$ipid` max_heap=`expr $heap + $heap + $heap` # Max allowed .data is 3heap # # Get current heap # if [ -d /proc/$ipid ]; then # # In case of Zombie, just kill'em: # if [ -O $PIDSDIR/$ipid ]; then is_z=`cat /proc/$ipid/status |grep State|cut -d":" -f2|cut -d"(" -f1|tr -d '\t'` if [ $is_z == "Z" ]; then logger -p daemon.warning "[Garbage-Inspector]: PID $ipid has been killed because of becoming Zombie (Z)." kill -9 $ipid 2>/dev/null # Kill process rm -rf $PIDSDIR/$ipid # Remove status file else curheap=`cat /proc/$ipid/status |grep VmData|cut -d":" -f2|tr -d '\t'|tr -d ' '|cut -d"k" -f1` echo -ne "[$curheap / $max_heap]\n" # In case of overriding HEAP size, kill process: if [ $curheap -ge $max_heap ]; then kill -9 $ipid 2>/dev/null # Kill process rm -rf $PIDSDIR/$ipid # Remove status file echo -ne "PID: $ipid KILLED to avoid lack !!\n" logger -p daemon.warning "[Garbage-Inspector]: PID $ipid has been killed because of a HEAP: $curheap kB." fi fi fi else echo -ne "[Not a valid process]\n" fi fi done cursec=`expr $cursec + 1` # Advance one second sleep 1s done # Remove lock file and exit logger -p daemon.info "[Garbage-Inspector]: Exiting at `date`" rm -rf $LOCKFILE exit 0 |
Code snippets for my npw-wrapper.c file changes
As I told you few lines earlier, nspluginwrapper has to call my own function called look_for_acroread() when the browser sends NPP_Destroy and NPP_Shutdown events, and as soon as a new Gplugin * object is created, also, as shown in cscope function-callers output:
File Function Line
0 npw-wrapper.c g_NPP_New 1817 look_for_acroread(2);
1 npw-wrapper.c g_NPP_Destroy 1893 look_for_acroread(1);
2 npw-wrapper.c g_NP_Shutdown 3339 look_for_acroread(0);
Let’s take a look at g_NPP_New() method, a “hook” for NPP_New() event:
(...) 1813 npw_printf("TCG: Instances: %d\n", instances); 1814 1815 if(strstr(g_plugin.name,"Adobe Reader")!=NULL) { 1816 syslog(LOG_DAEMON|LOG_INFO,"PDF instances: %d\n", instances); 1817 look_for_acroread(2); /* Get and show its .data segment size */ 1818 } 1819 (...) |
Here, look_for_acroread() just gets .data segment consumption for acroread process, storing its value on /var/log/nsplugin-pids/PID.
Next to this, inside g_NPP_Destroy() hook for NPP_Destroy() browser event, my own function calls, if needed, Garbage-Inspector, which is in charge of controlling all PIDS running as we know:
(...) 1892 /* Try to annotate "acroread" PID, if needed: */ 1893 look_for_acroread(1); 1894 1895 /* Decrement instances */ 1896 instances--; 1897 1898 /* Annotate in the syslog */ 1899 if(strstr(g_plugin.name,"Adobe Reader")!=NULL) 1900 syslog(LOG_DAEMON|LOG_INFO,"Removing instance, still: %d\n", instances); (...) |
To conclude, g_NPP_Shutdown just kills – using kill() system call – the current Acrobat Reader process.
(...) 3336 if (!g_plugin.is_wrapper) { 3337 3338 /* Try to kill Acroread */ 3339 look_for_acroread(0); 3340 3341 /* This piece of code is always executed, no matter what kind of plugin instance is 3342 * running right now. 3343 */ 3344 ret = invoke_NP_Shutdown(); 3345 plugin_exit(); /* Just kill this plugin instance, probably npvierwer.bin */ 3346 3347 } return ret; (...) Here is <em>look_for_acroread</em> function: <pre lang="c"> void look_for_acroread (int action){ DIR *dEntry; /* /proc object */ struct dirent *dirp; /* A directory entry */ FILE *fd; /* File stream */ int found_pid, annotate_file_fd, heap; pid_t acroread, garbage; /* ppid for each process running on the system */ char status_template[] = "/proc/%s/status" , file_status[PATH_MAX], file_status_option[PATH_MAX], *found, state[2], annotate_file[PATH_MAX], heap_str[PATH_MAX]; struct stat fs; if(g_plugin.name!=NULL){ /* Try to avoid SEGFAULTS */ found_pid=0; found = strstr(g_plugin.name,"Adobe Reader"); if(found!=NULL){ /* Just only if the plugin executed is Adobe Reader */ if(NULL!=(dEntry=opendir("/proc"))){ /* Browse inside /proc filesystem */ while((dirp=readdir(dEntry))!=NULL){ // Omit . and .. dir entries: if(strcmp(dirp->d_name,".")==0 || strcmp(dirp->d_name,"..")==0)continue; // Just inside a process, get file /proc/dirp->d_name/status: sprintf(file_status,status_template,dirp->d_name); // Try to open this file and get the line: 5, Ppid: ppid : if(NULL==(fd = fopen(file_status,"r")))continue; while(!feof(fd) && found_pid==0){ fscanf(fd,"%s",file_status_option); /* Get file entry */ /* Last option was Ppid ... ? */ if(strcmp(file_status_option,"PPid:")==0) { fscanf(fd,"%d",(int *)&acroread); /* We've got the ppid value as an integer: */ if(acroread==g_plugin.viewer_pid) { /* Just advance inside the stream in order to get VmData field : */ while(!feof(fd)){ fscanf(fd,"%s",file_status_option); /* Get the value and exit while : */ if(strcmp(file_status_option,"VmData:")==0){ fscanf(fd,"%d",(int *)&heap); break; } } /* Get annotate file ... */ sprintf(annotate_file,"/var/log/nsplugin-pids/%s",dirp->d_name); switch(action){ /* Don't annotate, simply kill it - Shutdown event dispatched */ case 0: npw_printf("TCG: Killing process %s\n", dirp->d_name); syslog(LOG_DAEMON|LOG_INFO,"TCG: Killing process %s\n", dirp->d_name); kill(atoi(dirp->d_name),SIGKILL); /* .. and that's all ! */ /* Delete PID annotate, if exists ... : */ unlink(annotate_file); break; /* Trivial check , if the process is Zombie, kill it ... : */ case 1: /* Store this PID - no matter what it's doing - */ npw_printf("TCG: Keeping an eye on PID: %s , sate: %s, Storing...\n", dirp->d_name, state); /* Report through syslog, also : */ syslog(LOG_DAEMON|LOG_INFO,"Keeping an eye on PID: %s", dirp->d_name); /* In case of Z, kill it : */ if(strcmp(state,"Z")==0){ syslog(LOG_DAEMON|LOG_ERR,"Killing zombie PID %s", dirp->d_name); npw_printf("TCG: Killing zombie, instances crashed: %d\n", instances); kill(atoi(dirp->d_name),SIGKILL); * Destroy the plugin and the viewer : */ plugin_kill(); plugin_killed = 0; /* Well, delete the annotate file : */ unlink(annotate_file); } else { /* If there's more than one instance, run garbage collector inspector: */ /* That's it 'cause of only when there are more than one instance running, the * event launched is NPP_Destroy, not NPP_Shutdown, and sometimes acroreads becomes * memory's leaker */ if(instances>1){ npw_printf("TCG: running Garbage Inspector ...\n"); /* Run via fork, don't wait ... : */ /* Only if the lock file doesn't exist: */ if((garbage=fork())<0) npw_printf("TCG: Queuing Garbage-Inspector: %s\n", strerror(errno)); else if (garbage == 0) { /* My child */ syslog(LOG_DAEMON|LOG_INFO,"Queuing Garbage-Inspector from plugin PID: %d ...", getpid()); execl("/usr/local/bin/garbage-inspector.sh", NULL); npw_printf("TCG: error running Garbage-Inspector\n"); } /* Now, Garbage-Inspector is running - or queued -... */ } } break; case 2: /* Report VmData segment size to syslog and stderr : */ npw_printf("TCG: HEAP size for PID: %s is %d\n", dirp->d_name,heap); /* Create the file in order to keep and eye on him */ if(-1!=(annotate_file_fd = open(annotate_file,O_CREAT|O_TRUNC|O_WRONLY,S_IRWXU))){ /* Store its heap - normal - consumption : */ sprintf(heap_str,"%d",heap); int bW = write(annotate_file_fd,heap_str,strlen(heap_str)); npw_printf("TCG: Bytes written: %d, %s\n", bW, strerror(errno)); close(annotate_file_fd); } /* Now, the "garbage collector" can acts over it */ /* The theory is kill always this PID when its heap consumption becomes 3*heap */ break; } found_pid=1; } } else if (strcmp(file_status_option,"State:")==0) fscanf(fd,"%s",state); /* We've got the process state right here */ } fclose(fd); if(found_pid==1)break; } closedir(dEntry); } else { npw_printf("TCG: error %s\n", strerror(errno)); } } }/* Acroread has been killed - or annotated to be killed in a near future !! */ } </pre> |
Using my nspluginwrapper
Simply compile & install it as usual:
./configure
make && make install
Then, install your plugins using:
nspluginwraper -v -i /path/to/your/nppdf.so file
Just before running your browser, create the log directory /var/log/nsplugin-pids and set it up with “the sticky bit”:
# mkdir /var/log/nsplugin-pids
# chmod 777 /var/log/nsplugin.pids/
#chmod +t /var/log/nsplugin-pids
What now?
My patch is running on five Linux boxes, two of them with Fedora Core 9 X86_64 distros, and the rest of them with Debian Etch 64. So far, so good. Despite the fact that it’s not a final code, just beta-software, I can assure you it’s working very well by now. Thus, I’ve decided to publish the entire nspluginwrapper 1.3.0 source code plus my additions and, at the same time, my Garbage-Inspector bash code.