Jump to content

Figuring out how GC works in spidermonkey


Recommended Posts

I stumbled upon some garbage collection related questions in 0ad's code and spent a few hours trying to figure out how GC works. The documentation didn't help much so I decided to write a small example program to answer my questions by testing.

I understand it better now but there are still some open questions. I thought I could share the program here and maybe someone can even answer some of the open questions.

Question: - I'm quite sure that the value of str inside myjs_function2 should be "undefined" too because the javascript variable textfromscript is out of scope. Why not?

Output:


value of str inside myjs_function1: textfromscript
calling GC and waiting 2 sec...
value of intjsval inside myjs_function2: 7
value of str inside myjs_function2: textfromscript
return value:undefined
value of intjsval after script termination: 7
value of str after script termination: undefined

main.cpp

#include "jsapi.h"
#include <iostream>
#include <fstream>
#include <streambuf>
#include <sstream>

using namespace std;
jsval intjsval;
JSString* str;

const char* readFile();

/* The class of the global object. */
JSClass global_class = { "global", JSCLASS_GLOBAL_FLAGS, JS_PropertyStub, JS_PropertyStub, JS_PropertyStub, JS_StrictPropertyStub, JS_EnumerateStub, JS_ResolveStub, JS_ConvertStub, NULL, JSCLASS_NO_OPTIONAL_MEMBERS };

/* The error reporter callback. */
void reportError(JSContext *cx, const char *message, JSErrorReport *report) {
fprintf(stderr, "%s:%u:%s\n",
report->filename ? report->filename : "<no filename="">",
(unsigned int) report->lineno,
message);
}

JSBool myjs_function1(JSContext *cx, uintN argc, jsval *vp)
{
char *text;

if (!JS_ConvertArguments(cx, argc, JS_ARGV(cx, vp), "S", &str))
return JS_FALSE;

text = JS_EncodeString(cx, str);
cout << "value of str inside myjs_function1: " << text << endl;

//JS_free(cx, str);

JS_SET_RVAL(cx, vp, STRING_TO_JSVAL(str)); /* return undefined */
return JS_TRUE;
}

JSBool myjs_function2(JSContext *cx, uintN argc, jsval* vp)
{
cout << "calling GC and waiting 2 sec..." << endl;
JS_GC(cx);
sleep(2);
int count;
if (!JS_ConvertArguments(cx, argc, JS_ARGV(cx, vp), "i", &count))
return JS_FALSE;
intjsval = INT_TO_JSVAL(count);
cout << endl;

char* text;
text = JS_EncodeString(cx, str);
cout << "value of intjsval inside myjs_function2: " << JSVAL_TO_INT(intjsval) << endl;
cout << "value of str inside myjs_function2: " << text << endl;

JS_SET_RVAL(cx,vp, JSVAL_VOID);
return JS_TRUE;
}

JSFunctionSpec myjs_global_functions[] = {
JS_FS("function1", myjs_function1, 1, 0),
JS_FS("function2", myjs_function2, 1, 0),
JS_FS_END
};

int main(int argc, const char *argv[]) {

/* JS variables. */
JSRuntime *rt;
JSContext *cx;
JSObject *global;

/* Create a JS runtime. */
rt = JS_NewRuntime(8L * 1024L * 1024L);
if (rt == NULL)
return 1;

/* Create a context. */
cx = JS_NewContext(rt, 8192);
if (cx == NULL)
return 1;
JS_SetOptions(cx, JSOPTION_VAROBJFIX | JSOPTION_METHODJIT);
JS_SetVersion(cx, JSVERSION_LATEST);
JS_SetErrorReporter(cx, reportError);
// Enable debugging of GC-related errors (maximum garbage collection after each and every allocation)
#ifdef DEBUG
JS_SetGCZeal(cx, 2);
#endif // DEBUG


/* Create the global object in a new compartment. */
global = JS_NewCompartmentAndGlobalObject(cx, &global_class, NULL);
if (global == NULL)
return 1;

/* Populate the global object with the standard globals, like Object and Array. */
if (!JS_InitStandardClasses(cx, global))
return 1;

if (!JS_DefineFunctions(cx, global, myjs_global_functions))
return JS_FALSE;



/* Your application code here. This may include JSAPI calls to create your own custom JS objects and run scripts. */
const char* script = readFile();
//cout << script << endl;
jsval rval;
JSString *str;
JSBool ok;
const char *filename = "unnamed";
uintN lineno = 0;

ok = JS_EvaluateScript(cx, global, script, strlen(script),
filename, lineno, &rval);
if (!ok)
return 1;

str = JS_ValueToString(cx, rval);
printf("\n return value:%s\n", JS_EncodeString(cx, str));

//JS_GC(cx);
//sleep(10);
int count1 = JSVAL_TO_INT(intjsval);

char* text;
text = JS_EncodeString(cx, str);

cout << "value of intjsval after script termination: " << count1 << endl;
cout << "value of str after script termination: " << text << endl;
/* End of application code */

JS_DestroyContext(cx);
JS_DestroyRuntime(rt);
JS_ShutDown();
return 0;
}


const char* readFile()
{
ifstream file;
char* content;
int length;

file.open("../../script.js", ifstream::in);

// get length of file:
file.seekg (0, ios::end);
length = file.tellg();
file.seekg (0, ios::beg);

// allocate memory (+1 for null-termination)
content = new char[length+1];
file.read(content, length);
file.close();
content[length] = '\0';

return content;
}

script.js

function jsfunction()
{
var textfromscript = "textfromscript";
function1(textfromscript);
}

var countInJs = 7;
jsfunction();
function2(countInJs);

Link to comment
Share on other sites

If I understand it correctly, str is a global c++ variable where you store some text in function 1 using ConvertArguments. I believe this copies the value to a new memory address, ie str and "textfromscript" are not pointing to the same memory address. Which means that even after a GC that should destroy textfromscript, str still points to the correct text, and thus function 2 can access it. It doesn't seem related to the GC.

The question is whether ConvertArguments actually copies the variables, but I'm assuming yes.

(edit: str is actually defined twice in your code, both as a global variable and a local one in "main", which might mess things up...)

  • Like 1
Link to comment
Share on other sites

(edit: str is actually defined twice in your code, both as a global variable and a local one in "main", which might mess things up...)

Argh. That's true. I also named this variable str and didn't notice it.

Now the third output of str is also "textfromscript".

I believe this copies the value to a new memory address, ie str and "textfromscript" are not pointing to the same memory address.

So I'm back where I started. No way to cause a GC-Error. :(

Link to comment
Share on other sites

I'm not familiar with Spidermonkey's exact implementation, but I've implemented similar systems in the past. As a result, the following contains a fair deal of assumptions.

A useful thing to know is that many GC designs don't depend on system malloc(), but keep around one or more large chunks of memory and do their own allocations inside those chunks (it's faster). This means that the memory is reused, and even if a certain object is "deleted" by the GC, it's still kept around in the process's memory until it's overwritten by another allocation.

Spidermonkey uses a simple type of "mark-and-sweep" GC (afaik) that doesn't move allocated objects around, so anything you allocate has a fixed memory address. Thus, you can access objects directly from C++ pointers, such as JSString*, without worrying that they might move... but when the memory is reclaimed by the GC and reused, the C++ is stuck with invalid pointers.

So, my guess is that you are dereferencing a dangling C++ pointer that points to an object in GC memory that has been "deleted" but is still in memory and hasn't been overwritten. In theory, if the JS keeps allocating new objects of similar size as that string, you'll eventually overwrite the memory and get nonsense output.

  • Like 1
Link to comment
Share on other sites

Yeah, I encountered the "shadow object" possibility, where something could make a GC-ed object actually reappear as the memory was still there. But I doubt your problem is linked to that.

Right, speaking of "shadow objects", there's also the possibility that since he's passing out a constant, it gets initialised statically so it never gets deleted (though it depends on what those conversion functions do..).

Link to comment
Share on other sites

Right, speaking of "shadow objects", there's also the possibility that since he's passing out a constant, it gets initialised statically so it never gets deleted (though it depends on what those conversion functions do..).

Philip suggested that on IRC too, so I tried this:

function jsfunction(s1)
{
var textfromscript = "textfromscript";
function1(textfromscript + s1);
}
var countInJs = 7;
jsfunction("blablubb");
function2(countInJs);

I made two dumps. The first in function 1 and the second when the script has terminated. I can see "textfromscript" in both dumps but "blablubb" only in the first. However the output at the end (where the script has terminated) is still "textfromscriptblablubb".

I guess that means the object is actually copied as wraitii said and will always be available. Maybe JS_ConvertArguments prevents GC-related errors.

Link to comment
Share on other sites

Does the dump show the raw memory contents, including deleted objects, or is it a liveness graph? If it's the latter, then it doesn't help prove or disprove what I wrote earlier...

If the object is copied, then it persists in the function2 call because of a memory leak and you'd need to free it manually.

You could create a bunch of short strings and see if it affects the output. eg:


function jsfunction(s1)
{
var textfromscript = "textfromscript";
function1(textfromscript + s1);
}
var countInJs = 7;
jsfunction("blablubb");

function rec(i)
{
var s = Math.random().toString(36).substring(7);
if (i > 0)
rec(i - 1);
}
rec(10000); // allocate a bunch of strings

function2(countInJs);

Link to comment
Share on other sites

Does the dump show the raw memory contents, including deleted objects, or is it a liveness graph?

Unfortunately I don't know what it dumps exactly. I wish it was documented better. :(

You could create a bunch of short strings and see if it affects the output. eg:

...

Until 5000 recursions it doesn't change the output but when I set 10000 i get:


InternalError: too much recursion

lol :D

Link to comment
Share on other sites

Tried running your code. For me, the JS_EncodeString call in function2 causes an "out of memory" error. I queried the length of the string on the line before that call and it returns "8769178900488".

The original "textfromscript" version of your code works fine, because the string is const. The "blablubb"/recursive versions cause the error, for the reason I explained earlier.

Don't know why you can't reproduce this, but at least the theory checks out.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...