Latest Tweets

GNU/Linux 64 bits distros, nspluginwrapper and nppdf.so

Recently, we have discovered an awful BUG affecting nspluginwrapper on some GNU/Linux 64 bit distributions while using Acrobat Reader’s nppdf.so plugin in Iceweasel / Firefox browsers.

Bug details:

As long as a certain PDF document is showed inside a tab or window’s browser using nspluginwrapper (no matter what version) through Acrobat Reader 32 bit plugin 8.X., process “acroread” starts increasing its HEAP size memory consumption until the O.S enters in Trashing mode.

A brief analysis about nspluginwrapper’s code

First of all, as soon as the nspluginwrapper starts its execution thanks to NPP_Initialize NPAPI function call, it hooks all NPP calls to its own routines:

3210   plugin_funcs->size = sizeof(NPPluginFuncs);
3211   plugin_funcs->version = NPW_NPAPI_VERSION;
3212   plugin_funcs->newp = NewNPP_NewProc(g_NPP_New);
3213   plugin_funcs->destroy = NewNPP_DestroyProc(g_NPP_Destroy);
3214   plugin_funcs->setwindow = NewNPP_SetWindowProc(g_NPP_SetWindow);
3215   plugin_funcs->newstream = NewNPP_NewStreamProc(g_NPP_NewStream);
3216   plugin_funcs->destroystream = NewNPP_DestroyStreamProc(g_NPP_DestroyStream);
3217   plugin_funcs->asfile = NewNPP_StreamAsFileProc(g_NPP_StreamAsFile);
3218   plugin_funcs->writeready = NewNPP_WriteReadyProc(g_NPP_WriteReady);
3219   plugin_funcs->write = NewNPP_WriteProc(g_NPP_Write);
3220   plugin_funcs->print = NewNPP_PrintProc(g_NPP_Print);
3221   plugin_funcs->event = NewNPP_HandleEventProc(g_NPP_HandleEvent);
3222   plugin_funcs->urlnotify = NewNPP_URLNotifyProc(g_NPP_URLNotify);
3223   plugin_funcs->javaClass = NULL;
3224   plugin_funcs->getvalue = NewNPP_GetValueProc(g_NPP_GetValue);
3225   plugin_funcs->setvalue = NewNPP_SetValueProc(g_NPP_SetValue);

Reading NPAPI reference, NPP_Destroy event will be called as long as a new plugin instance has been closed – closing the tab, surfing to another URL location with different MIME type, etc -.

Inside NPP_Detroy event, any plugin must free all allocated memory in order to avoid memory lacks or excessive memory consumption. Thus, reading the sources, we can find out where all memory will be freed: in g_NPP_Destroy.

Finally, and only when there are no more instances running inside the browser, NPP_Shutdown method will be called from the browser, destroying – apart from all allocated memory -, npviewer.bin process, in charge of showing up the entire 32 bit plugin on the 64 bit browser. In particular, thinking about Acrobat Reader, the PDF document by itself:

3351 NPError
3352 NP_Shutdown(void)
3353 {
3354   NPError ret = g_NP_Shutdown();
3355
3356   npobject_bridge_destroy();
3357
3358   id_kill();
3359
3360   return ret;
3361 }

This issue can be demonstrated using pstree command:

 |-firefox-bin-+-npviewer.bin---ld-linux.so.2

Here, ld-linux.so is, in fact, the acroread’s pre-loader process. On some distros, so long as lsb libraries are installed, process “acroread” loads perfectly with no pre-loader like ld-linux.so.

The main problem is that, sometimes, in spite of killing npviewer.bin rightly after processing g_NPP_Destroy or g_NPP_Shutdown functions, acroread process starts increasing its .data segment size (because of consecutively calls to brk() system-call), until the entire O.S enters in Trashing mode.

The first approach: Garbage-Inspector

Instead of trying to determine what the hell is going on with acroread process, quite difficult, I preferred to develop a “Garbage-Inspector” process in order to kill acroread process (if needed), avoiding memory lack and system trashing.

To do this, there are two main parts of code: one of them placed just inside nspluginwrapper project (npw-wrapper.c file) and the another one outside, in a shell-script bash file called garbage-inspector.sh.

The first one is in charge of, as long as g_NPP_Destroy is called, run – if needed -, the “Garbage-Inspector” shell script. When a new Plugin * object is referenced and created, its .data segment memory consumption is updated and stored inside a file on /var/log/nsplugin-pids/PID.

Garbage-Inspector iterates all over these PIDs stored on /var/log/nsplugin-pids/ directory, comparing current .data process consumption of all of them with their previous values stored thanks to my nspluginwrapper added code. As soon as any of them become superior than 3data kB, Garbage-Inspector kills ’em and annotates this action in the syslog.

You can figure out how Garbage-Inspector and nspluginwrapper modified code work reading the syslog:

669 Feb 11 16:36:31 taltos firefox-bin: PDF instances: 1
670 Feb 11 16:36:47 taltos firefox-bin: PDF instances: 2
671 Feb 11 16:36:50 taltos firefox-bin: Keeping an eye on PID: 20323
672 Feb 11 16:36:50 taltos firefox-bin: Removing instance, still: 1
673 Feb 11 16:36:59 taltos firefox-bin: PDF instances: 2
674 Feb 11 16:37:02 taltos firefox-bin: Keeping an eye on PID: 20323
675 Feb 11 16:37:02 taltos firefox-bin: Removing instance, still: 1
678 Feb 11 16:39:38 taltos firefox-bin: Keeping an eye on PID: 20323
679 Feb 11 16:39:38 taltos firefox-bin: Killing zombie PID 20323
680 Feb 11 16:39:42 taltos firefox-bin: Removing instance, still: -1
681 Feb 11 16:40:21 taltos firefox-bin: PDF instances: 1
682 Feb 11 16:40:29 taltos firefox-bin: PDF instances: 2
683 Feb 11 16:40:33 taltos firefox-bin: Keeping an eye on PID: 23052
685 Feb 11 16:40:33 taltos firefox-bin: Removing instance, still: 1
686 Feb 11 16:40:33 taltos logger: [Garbage-Inspector]: Starting at Wed Feb 11 16:40:33 CET 2009
687 Feb 11 16:40:39 taltos firefox-bin: PDF instances: 2
688 Feb 11 16:40:45 taltos firefox-bin: Keeping an eye on PID: 23052
689 Feb 11 16:40:45 taltos firefox-bin: Removing instance, still: 1
<strong><span style="color: #ff0000;">690 Feb 11 16:41:49 taltos logger: [Garbage-Inspector]: PID 23052 has been killed using: 60096 k</span><span style="color: #ff0000;">B.</span></strong>
(...)
700 Feb 11 16:50:42 taltos logger: [Garbage-Inspector]: Exiting at WedFeb 11 16:50:42 CET 2009

How my Garbage-Inspector code looks like

Developed as a Shell-Script in BASH, Garbage-Inspector is based on lockfile , logger and trivial shell-scripting techniques. Its data is received trough /var/log/nsplugin-pids, one file per running acroread PID. As soon as no PIDS remain on this directory, Garbage-Inspector process finish its execution. As long as it’s running, there’s no much memory or processor consumption because of its own design. Here you have it:

PIDSDIR="/var/log/nsplugin-pids"
ITIME=600    # Time in seconds while garbage will be running.
LOCKFILE="$PIDSDIR/garbage-inspector.lock"
 
# Disable CTRL+C
trap '' 2
 
#
# Avoid running more than one instance. Check for the
# lock file, if exists, sleep 8 seconds and retry it again ... and again ...
#
lockfile -8 -r -1 $LOCKFILE
 
logger -p daemon.info "[Garbage-Inspector]: Starting at `date`"
 
#
# Now iterate for ITIME seconds in the system, after this time
# exit and remove lockfile
#
cursec=0
while [ $cursec -le $ITIME ]; do
echo "Second: $cursec"
#
# Iterate inside any PID file, open it up and get its VmData "normal"
# consumption
#
PIDS=`ls $PIDSDIR`
for ipid in $PIDS; do
#
# Get number of PIDS inside the dir:
#
NPIDS=`ls $PIDSDIR|wc -l`
if [ $NPIDS -le 1 ]; then
# There are no PIDS, so exiting ...
logger -p daemon.info "[Garbage-Inspector]: Exiting (no PIDS) at `date`"
rm -rf $LOCKFILE
exit 0
fi
if [ "$PIDSDIR/$ipid" != "$LOCKFILE" ]; then
echo -ne "Checking HEAP for PID $ipid ...: "
heap=`cat $PIDSDIR/$ipid`
max_heap=`expr $heap + $heap + $heap`    # Max allowed .data is 3heap
#
# Get current heap
#
if [ -d /proc/$ipid ]; then
#
# In case of Zombie, just kill'em:
#
if [ -O $PIDSDIR/$ipid ]; then
is_z=`cat /proc/$ipid/status |grep State|cut -d":" -f2|cut -d"(" -f1|tr -d '\t'`
if [ $is_z == "Z" ]; then
logger -p daemon.warning "[Garbage-Inspector]: PID $ipid has been killed because of becoming Zombie (Z)."
kill -9 $ipid 2&gt;/dev/null   # Kill process
rm -rf $PIDSDIR/$ipid       # Remove status file
else
curheap=`cat /proc/$ipid/status |grep VmData|cut -d":" -f2|tr -d '\t'|tr -d ' '|cut -d"k" -f1`
echo -ne "[$curheap / $max_heap]\n"
# In case of overriding HEAP size, kill process:
if [ $curheap -ge $max_heap ]; then
kill -9 $ipid 2&gt;/dev/null    # Kill process
rm -rf $PIDSDIR/$ipid        # Remove status file
echo -ne "PID: $ipid KILLED to avoid lack !!\n"
logger -p daemon.warning "[Garbage-Inspector]: PID $ipid has been killed because of a HEAP: $curheap kB."
fi
fi
fi
else
echo -ne "[Not a valid process]\n"
fi
fi
done
cursec=`expr $cursec + 1`    # Advance one second
sleep 1s
done
 
# Remove lock file and exit
logger -p daemon.info "[Garbage-Inspector]: Exiting at `date`"
rm -rf $LOCKFILE
exit 0

Code snippets for my npw-wrapper.c file changes

As I told you few lines earlier, nspluginwrapper has to call my own function called look_for_acroread() when the browser sends NPP_Destroy and NPP_Shutdown events, and as soon as a new Gplugin * object is created, also, as shown in cscope function-callers output:

File Function Line
0 npw-wrapper.c g_NPP_New 1817 look_for_acroread(2);
1 npw-wrapper.c g_NPP_Destroy 1893 look_for_acroread(1);
2 npw-wrapper.c g_NP_Shutdown 3339 look_for_acroread(0);

Let’s take a look at g_NPP_New() method, a “hook” for NPP_New() event:

(...)
1813   npw_printf("TCG: Instances: %d\n", instances);
1814
1815   if(strstr(g_plugin.name,"Adobe Reader")!=NULL) {
1816     syslog(LOG_DAEMON|LOG_INFO,"PDF instances: %d\n", instances);
1817     look_for_acroread(2);   /* Get and show its .data segment size */
1818   }
1819
(...)

Here, look_for_acroread() just gets .data segment consumption for acroread process, storing its value on /var/log/nsplugin-pids/PID.

Next to this, inside g_NPP_Destroy() hook for NPP_Destroy() browser event, my own function calls, if needed, Garbage-Inspector, which is in charge of controlling all PIDS running as we know:

(...)
1892   /* Try to annotate "acroread" PID, if needed: */
1893   look_for_acroread(1);
1894
1895   /* Decrement instances */
1896   instances--;
1897
1898   /* Annotate in the syslog */
1899   if(strstr(g_plugin.name,"Adobe Reader")!=NULL)
1900     syslog(LOG_DAEMON|LOG_INFO,"Removing instance, still: %d\n", instances);
(...)

To conclude, g_NPP_Shutdown just kills – using kill() system call – the current Acrobat Reader process.

(...)
3336   if (!g_plugin.is_wrapper) {
3337
3338     /* Try to kill Acroread */
3339     look_for_acroread(0);
3340
3341     /* This piece of code is always executed, no matter what kind of plugin instance is
3342      * running right now.
3343      */
3344     ret = invoke_NP_Shutdown();
3345     plugin_exit();  /* Just kill this plugin instance, probably npvierwer.bin */
3346
3347   } return ret;
(...)
 
Here is <em>look_for_acroread</em> function:
 
&lt;pre lang="c"&gt;
void look_for_acroread (int action){
 
	DIR *dEntry;  				/* /proc object */
  	struct dirent *dirp;  			/* A directory entry */
  	FILE *fd;             			/* File stream */
  	int found_pid,
	annotate_file_fd, heap;
  	pid_t acroread, garbage;       		/* ppid for each process running on the system */
        char status_template[] = "/proc/%s/status" ,
                 file_status[PATH_MAX],
                 file_status_option[PATH_MAX],
                 *found, state[2],
		 annotate_file[PATH_MAX], heap_str[PATH_MAX];
 
	struct stat fs;
 
    if(g_plugin.name!=NULL){	/* Try to avoid SEGFAULTS */
            found_pid=0;
            found = strstr(g_plugin.name,"Adobe Reader");
            if(found!=NULL){        /* Just only if the plugin executed is Adobe Reader */
    	        if(NULL!=(dEntry=opendir("/proc"))){	/* Browse inside /proc filesystem */
 	               while((dirp=readdir(dEntry))!=NULL){
    	               // Omit . and .. dir entries:
                       if(strcmp(dirp-&gt;d_name,".")==0 || strcmp(dirp-&gt;d_name,"..")==0)continue;
                       // Just inside a process, get file /proc/dirp-&gt;d_name/status:
                       sprintf(file_status,status_template,dirp-&gt;d_name);
        	           // Try to open this file and get the line: 5, Ppid: ppid :
                       if(NULL==(fd = fopen(file_status,"r")))continue;
                       	while(!feof(fd) &amp;&amp; found_pid==0){
                        	fscanf(fd,"%s",file_status_option);     /* Get file entry */
                            /* Last option was Ppid ... ? */
                            if(strcmp(file_status_option,"PPid:")==0) {
                            	fscanf(fd,"%d",(int *)&amp;acroread);
 
                                /* We've got the ppid value as an integer: */
                                if(acroread==g_plugin.viewer_pid) {
                                      /* Just advance inside the stream in order to get VmData field : */
					while(!feof(fd)){
						fscanf(fd,"%s",file_status_option);
						/* Get the value and exit while : */
						if(strcmp(file_status_option,"VmData:")==0){
							fscanf(fd,"%d",(int *)&amp;heap); break;
						}
					}
 
					/* Get annotate file ... */
					sprintf(annotate_file,"/var/log/nsplugin-pids/%s",dirp-&gt;d_name);
 
					switch(action){
						/* Don't annotate, simply kill it - Shutdown event dispatched */
						case 0:
		                                    npw_printf("TCG: Killing process %s\n", dirp-&gt;d_name);
		                                    syslog(LOG_DAEMON|LOG_INFO,"TCG: Killing process %s\n", dirp-&gt;d_name);
    		                                    kill(atoi(dirp-&gt;d_name),SIGKILL);	/* .. and that's all ! */
						    /* Delete PID annotate, if exists ... : */
						    unlink(annotate_file);
						break;
						/* Trivial check , if the process is Zombie, kill it ... : */
						case 1:
						/* Store this PID - no  matter what it's doing - */
						npw_printf("TCG: Keeping an eye on PID: %s , sate: %s, Storing...\n", dirp-&gt;d_name, state);
						/* Report through syslog, also : */
						syslog(LOG_DAEMON|LOG_INFO,"Keeping an eye on PID: %s", dirp-&gt;d_name);
						/* In case of Z, kill it : */
						if(strcmp(state,"Z")==0){
							syslog(LOG_DAEMON|LOG_ERR,"Killing zombie PID %s", dirp-&gt;d_name);
							npw_printf("TCG: Killing zombie, instances crashed: %d\n", instances);
							kill(atoi(dirp-&gt;d_name),SIGKILL);
							* Destroy the plugin and the viewer : */
							plugin_kill(); plugin_killed = 0;
							/* Well, delete the annotate file : */
							unlink(annotate_file);
						} else {
							/* If there's more than one instance, run garbage collector inspector: */
							/* That's it 'cause of only when there are more than one instance running, the
							* event launched is NPP_Destroy, not NPP_Shutdown, and sometimes acroreads becomes
							* memory's leaker
							*/
							if(instances&gt;1){
								npw_printf("TCG: running Garbage Inspector ...\n");
								/* Run via fork, don't wait ... : */
								/* Only if the lock file doesn't exist: */
								if((garbage=fork())&lt;0)
									npw_printf("TCG: Queuing Garbage-Inspector: %s\n", strerror(errno));
								else if (garbage == 0) {	/* My child */
									syslog(LOG_DAEMON|LOG_INFO,"Queuing Garbage-Inspector from plugin PID: %d ...", getpid());
									execl("/usr/local/bin/garbage-inspector.sh", NULL);
									npw_printf("TCG: error running Garbage-Inspector\n");
								}	/* Now, Garbage-Inspector is running - or queued -... */
							}
						}
						break;
						case 2:
						        /* Report VmData segment size to syslog and stderr : */
							npw_printf("TCG: HEAP size for PID: %s is %d\n", dirp-&gt;d_name,heap);
							/* Create the file in order to keep and eye on him */
							if(-1!=(annotate_file_fd = open(annotate_file,O_CREAT|O_TRUNC|O_WRONLY,S_IRWXU))){
								/* Store its heap - normal - consumption : */
								sprintf(heap_str,"%d",heap);
								int bW = write(annotate_file_fd,heap_str,strlen(heap_str));
								npw_printf("TCG: Bytes written: %d, %s\n", bW, strerror(errno));
								close(annotate_file_fd);
							}
							/* Now, the "garbage collector" can acts over it */
							/* The theory is kill always this PID when its heap consumption becomes 3*heap */
						break;
					} found_pid=1;
                                }
                            } else if (strcmp(file_status_option,"State:")==0)
				fscanf(fd,"%s",state);	/* We've got the process state right here */
                       	} fclose(fd);
                       if(found_pid==1)break;
                  } closedir(dEntry);
                } else { npw_printf("TCG: error %s\n", strerror(errno)); }
            }
    }/* Acroread has been killed  - or annotated to be killed in a near future !! */
}
&lt;/pre&gt;

Using my nspluginwrapper

Simply compile & install it as usual:

./configure

make && make install

Then, install your plugins using:

nspluginwraper -v -i /path/to/your/nppdf.so file

Just before running your browser, create the log directory /var/log/nsplugin-pids and set it up with “the sticky bit”:

# mkdir /var/log/nsplugin-pids
# chmod 777 /var/log/nsplugin.pids/
#chmod +t /var/log/nsplugin-pids

What now?

My patch is running on five Linux boxes, two of them with Fedora Core 9 X86_64 distros, and the rest of them with Debian Etch 64. So far, so good. Despite the fact that it’s not a final code, just beta-software, I can assure you it’s working very well by now. Thus, I’ve decided to publish the entire nspluginwrapper 1.3.0 source code plus my additions and, at the same time, my Garbage-Inspector bash code.

nsplugin-tcg-nolacktar

garbage-inspector