Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDS pipethrough optimizations #2770

Open
wants to merge 46 commits into
base: main
Choose a base branch
from

Conversation

matthieusieben
Copy link
Contributor

No description provided.

@matthieusieben matthieusieben force-pushed the msieben/micro-optimizations branch 4 times, most recently from 7d9350c to f629fc7 Compare September 2, 2024 18:39
@matthieusieben matthieusieben changed the title Micro optimizations in PDS PDS pipethrough optimizations Sep 6, 2024
@matthieusieben
Copy link
Contributor Author

matthieusieben commented Sep 6, 2024

Still need to make sure tests pass and add some more

@@ -43,15 +43,6 @@ export function ssrfFetchWrap<C = FetchContext>({
}

if (url.protocol === 'http:' || url.protocol === 'https:') {
// @ts-expect-error non-standard option
if (request.dispatcher) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dispatcher was never exposed on Request


if (allowUnknownTld !== true && !isValidDomain(url.hostname)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be performed through the lookup fn

Comment on lines 125 to 129
const url = new URL(urlStr)

if (!url.hostname) {
return undefined
}
Copy link
Collaborator

@devinivy devinivy Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for a url to start with http:// or https:// but successfully parse and not have a hostname? I wonder if we can use something that doesn't require throwing, e.g.

if (!URL.canParse(urlStr)) {
  return undefined
}
return urlStr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely

@@ -73,9 +47,9 @@ export async function parseStream(
)

if (!parser) {
throw createHttpError(400, 'Unsupported content-type')
throw new TypeError(`Unsupported content-type: ${type.mime}`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there's an error handler somewhere that needs to distinguish this TypeError, which should turn into a 415, from a TypeError coming from the runtime (e.g. null.x), which should turn into a 500. How would it distinguish them?

Copy link
Contributor Author

@matthieusieben matthieusieben Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled here:

try {
return await parseStream(readable, req.headers['content-type'], allow)
} catch (err) {
throw createHttpError(400, err, { expose: err instanceof TypeError })
}

The idea is that parseStream is generic (does not know if its parsing a request, a response, a file on disk, etc.) so the responsibility to convert the error falls onto the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As to distinguishing null pointer error, my hypothesis is that when using TypeScript, TypeError are more likely invalid input errors (e.g. new URL('ff')) rather than actual null pointer errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use a custom Error class for this particular purpose (because I don't want to throw something that is explicitely either a 415 or a 500). I just find TypeError to be handy and consistent with the way some native function work.

}
return new Uint8Array(Buffer.concat(bufs))
// Note Buffer is a subclass of Uint8Array
return Buffer.concat(chunks, totalLength)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for changing the return type here?

i've been bit by subtle differences between Buffers & Uint8Arrays before & kinda prefer using just one or the other in most of our stuff is possible. or at least differentiating between the two and saying "bytes" for Uint8Arrays and "buffer" for Buffers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer is a sub-class of Uint8Array. So it kinda makes no sense to wrap a Buffer within an Uint8Array since we loose functionalities & we create two objects instead of one.
Since this function is only used in the pipethrough code, I will rename the fn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that common-web already has a streamToBuffer (that actually performs a AsyncIterable<Uint8Array> to Uint8Array). I think it's best if we don't change that function's name so I named this one to streamToNodeBuffer ...

})
}
if (isHandlerPipeThroughBuffer(outputUnvalidated)) {
setHeaders(res, outputUnvalidated)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean

}
if (!Object.values(v['headers']).every(isString)) {
return false
const createValidator =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we avoided zod here because it can be kinda heavy duty for validation of stuff that is often invalid. I know it used to handle this by throwing and catching errors which can be rough on Node so we tried to keep it out of the happy/critical path of our server stuff.

But I also have some vague memory that they fixed this & made it faster? Any idea? Or any idea how these compare?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, validating here is way overkill. Since we know the input is of type HandlerOutput, all we need is to discriminate based on that. We can even make some of the checks we currently do disappear !

const createValidator =
<T>(schema: ZodSchema<T>) =>
(v: unknown): v is T =>
schema.safeParse(v).success
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a helper function for this that we use in most places in common-web/check

check.is(v, schema)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check.create(schema) as the semantics I needed are a bit different

return nsid.endsWith('/') ? nsid.slice(0, -1) : nsid // trim trailing slash
// /!\ Hot path

const originalUrl = ('originalUrl' in req && req.originalUrl) || req.url
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we rewrite as ternary? may be a bit easier to read on first glance

@@ -23,42 +30,52 @@ export const getLocalLag = (local: LocalRecords): number | undefined => {
return Date.now() - new Date(oldest).getTime()
}

export const handleReadAfterWrite = async <T>(
export const pipethroughReadAfterWrite = async <T>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dig this fn a lot

const safeAgent = new undici.Agent({
allowH2: cfg.fetch.allowHTTP2, // This is experimental
headersTimeout: cfg.fetch.headersTimeout,
maxResponseSize: cfg.fetch.maxResponseSize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we cool re-using these config values for both the safeFetch & safeAgent?

only issue I could think of is if we have different tradeoffs on maxResponseSize

@@ -256,12 +263,36 @@ export class AppContext {
appviewCdnUrlPattern: cfg.bskyAppView?.cdnUrlPattern,
})

const safeAgent = new undici.Agent({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost want to call this proxyAgent or safeProxyAgent or something similar to give the sense of what it's for since it's only use there

reqInit: RequestInit,
): Promise<Response> => {
let res: Response
type Accept = [name: string, flags: { q: string }]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on

type Accept = {
  name: string
  q: string
}

instead of the array syntax? seems more clear & type-safe

Copy link
Contributor Author

@matthieusieben matthieusieben Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of this code is to be agnostic of the accept-* header being negotiated.

Consider for example:

Accept-Patch: text/example;charset=utf-8, text/example

In that case, the Accept type could be generalized as such:

type Accept = [name: string, flags: Record<string, string>]

When there is an obvious case for generalization, I like to write generic code. I guess this boils down to coding style preferences.

I can make this code more straight forward if you prefer or if you think this is too much YAGNI ?

What type safety difference are you thinking of ?

Copy link
Collaborator

@dholms dholms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some small tweaks/questions, but broadstrokes this looks awesome 💪 super stoked to get this in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants