Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to persist the NODE_COMPILE_CACHE programmatically #54770

Open
javivelasco opened this issue Sep 4, 2024 · 12 comments · May be fixed by #54971
Open

Allow to persist the NODE_COMPILE_CACHE programmatically #54770

javivelasco opened this issue Sep 4, 2024 · 12 comments · May be fixed by #54971
Labels
feature request Issues that request new features to be added to Node.js.

Comments

@javivelasco
Copy link

What is the problem this feature will solve?

The implementation for NODE_COMPILE_CACHE is a long-awaited feature, and we are super happy to have it in Node.js 22. However, at the time of writing this, and unless I missed it somewhere, the cache can only be persisted on process shutdown.

This is an issue in serverless environments where a Node.js function starts in a VM and then gets paused/resumed on subsequent invocations until, at some point, it gets shut down. In this use case, the file system is ephemeral, and we can't easily hook into the process shutdown to persist the cache.

What is the feature you are proposing to solve the problem?

We propose having an API exposed in the module API to either:

  • Get a blob with the contents of the cache without having to persist it to disk.
  • Write the cache to disk.

cc @joyeecheung @QuiiBz

What alternatives have you considered?

We considered shutting down the process and resuming it within the VM to get the cache, but that leaves us with a cold new process. Although it would pick up the cache, it would still perform worse than a warm one.

@javivelasco javivelasco added the feature request Issues that request new features to be added to Node.js. label Sep 4, 2024
@joyeecheung
Copy link
Member

Write the cache to disk.

I think this would be preferable - we have a similar pattern with v8.takeCoverage() that flushes the coverage enabled by NODE_V8_COVERAGE to disk. Also, converting in-memory cache blobs to anything accessible to JS land has their own cost and introduce life-cycle complexities.

@joyeecheung
Copy link
Member

Potentially addresses the same request in #54465 (comment) (where I also commented that an API to just flush the cache would probably solve the problem).

@javivelasco
Copy link
Author

Thanks for such a quick answer @joyeecheung! Writing directly to disk is definitely ok, the goal is just to have a way to programmatically access the cache.

@joyeecheung
Copy link
Member

joyeecheung commented Sep 4, 2024

Draft API (I think in most cases, there is no point setting clearReadCache to false, as a loaded module is usually never recompiled, or it's recompiled for code changes which means the cache from the old source code should be ignored. But it doesn't seem to hurt to keep the option open)

/**
 * Flush the compile cache accumulated from loaded modules to disk.
 * 
 * Using the clearReadCache option is usually beneficial if the thread does not intend to
 * recompile a module with the same source code again, because once the module is compiled
 * in the thread, there is another tier of compiled module cache in Node.js/V8. Unless the module
 * will be purged from cache (e.g. by deleting from require.cache) and recompiled in the
 * same thread with the same source code, there is no point to keep the code cache read from
 * disk around, which can be redundant given that there is another tier of compiled module cache.
 *
 * Even if the code is meant to be re-compiled, if its source code is expected to change,
 * clearing the stale cache read from disk is still desirable.
 *
 * @param {boolean} clearReadCache Whether the in-memory cache should be cleared.
 */
function flushCompileCache(clearReadCache = true) {

}

This should probably also be done with some TODO I planned to do to write the cache to a temporary file and then rename it, to reduce the risk of race conditions.

@joyeecheung
Copy link
Member

joyeecheung commented Sep 4, 2024

After thinking about it a bit, there is also another way to deal with this - instead of having users flush the compile cache manually, we can introduce a mode where the cache is flushed as soon as it's compiled - this introduces latency to the first-time evaluation of the modules, instead of delaying that latency to process shutdown, but in a modern FS that's fast enough, the cost of writing is usually not that big, and we can also reduce it further in the future by moving the writes off-thread (that has been a TODO in the first place) - and when that happens, users get to enjoy the reduced overhead if they are already using the "write as soon as the cache is ready" mode, without changing their code again.

@javivelasco
Copy link
Author

I believe that for our specific use case, it might be more convenient to have the ability to decide when to flush the cache manually. Specifically, we are trying to optimize for faster cold boots, so introducing latency during the first-time evaluation is not very appealing, even if it's minimal. However, if the extra latency occurs only when there is no cache, it would be a one-time event, which might be acceptable.

The API draft looks great! :chefkiss:

@joyeecheung
Copy link
Member

joyeecheung commented Sep 5, 2024

However, if the extra latency occurs only when there is no cache, it would be a one-time event, which might be acceptable

Yes, basically the latency caused by write only happens when the cache is not already on disk, or if it's stale. The next time the same module is loaded with the cache and the cache is accepted (which means it's not stale) there would be no writes.

I think we can add the following:

  1. An API for manual flushing
  2. A mode for writing cache to disk right after compilation

You could use 1 for the time being, but 2 might actually be more optimal (can benchmark to find out). Also when we implement 2 as off-thread writes we can make it the default, I expect the cost of simply notifying another thread to write the cache to disk to be negligible. At that point 1 would usually be mostly no-ops - the cache are likely to all be written to disk by another thread by the time you get to evaluate some code to flush it.

@javivelasco
Copy link
Author

Sounds like the best of both worlds! :love-it:

@javivelasco
Copy link
Author

@joyeecheung Any plans to work on this soon? If not, I think I can work on it myself

@joyeecheung
Copy link
Member

I have a WIP here https://github.com/joyeecheung/node/tree/write-compile-cache - haven't tested it locally yet, just pushed it before I went on holidays. Planning to get back to it next week when I am back from holidays.

@javivelasco
Copy link
Author

I have a WIP here https://github.com/joyeecheung/node/tree/write-compile-cache - haven't tested it locally yet, just pushed it before I went on holidays. Planning to get back to it next week when I am back from holidays.

Awesome, thanks a lot! 🙏 Enjoy your holidays

@joyeecheung joyeecheung linked a pull request Sep 16, 2024 that will close this issue
@joyeecheung
Copy link
Member

Opened #54971

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js.
Projects
Status: Awaiting Triage
Development

Successfully merging a pull request may close this issue.

2 participants