Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance #19

Open
jgm opened this issue Sep 30, 2019 · 4 comments
Open

Improve performance #19

jgm opened this issue Sep 30, 2019 · 4 comments

Comments

@jgm
Copy link
Owner

jgm commented Sep 30, 2019

See notes on performance in the README.md.

@jgm
Copy link
Owner Author

jgm commented Feb 9, 2020

What I've tried

  • rewriting to operate directly on Text instead of tokenizing first
  • rewriting to operate directly on Text, using megaparsec instead of parsec, and using the fast parsers takeWhileP etc.
  • rewriting to use ByteStrings instead of Texts in the Toks.

None of this achieved any speed improvement over the current version using [Tok]; indeed, in every case performance was worse.

Profiling reveals that block structure parsing is fast. Most of the time is taken up by tokenize and restOfLine (31%), and by inline parsing.

Instructions for profiling

make prof

Current results (March 12 2020):

1.8 	 parseChunks
2.1 	 pDelimChunk
2.2 	 Commonmark.Blocks.runInlineParser
2.5 	 blockContinues
2.6 	 Commonmark.Inlines.processBs
2.9 	 MAIN
3.9 	 block_starts
6.6 	 renderHtml
9.0 	 pSymbol
11.9 	 defaultInlineParser
17.5 	 Commonmark.Tokens.tokenize
32.6 	 restOfLine

@jgm
Copy link
Owner Author

jgm commented Mar 13, 2020

For a 1.4MB file:

Screen Shot 2020-03-12 at 9 23 26 PM

@jgm
Copy link
Owner Author

jgm commented Mar 13, 2020

Benchmarks for different extensions:

extension mean
-xautolinks 310.8 ms (309.3 ms .. 311.3 ms)
-xpipe_tables 295.2 ms (293.2 ms .. 296.6 ms)
-xstrikethrough 267.9 ms (265.6 ms .. 269.1 ms)
-xsuperscript 267.8 ms (264.9 ms .. 269.5 ms)
-xsubscript 266.8 ms (263.6 ms .. 267.9 ms)
-xsmart 293.0 ms (292.0 ms .. 294.3 ms)
-xmath 287.4 ms (285.4 ms .. 290.7 ms)
-xemoji 281.6 ms (280.3 ms .. 282.8 ms)
-xfootnotes 291.3 ms (286.1 ms .. 293.3 ms)
-xdefinition_lists 272.6 ms (271.0 ms .. 275.4 ms)
-xfancy_lists 271.2 ms (269.3 ms .. 273.8 ms)
-xattributes 284.2 ms (283.4 ms .. 285.7 ms)
-xraw_attribute 280.7 ms (279.6 ms .. 281.6 ms)
-xbracketed_spans 268.5 ms (267.0 ms .. 269.4 ms)
-xfenced_divs 269.6 ms (267.5 ms .. 271.6 ms)
-xauto_identifiers 274.9 ms (273.0 ms .. 277.8 ms)
-ximplicit_heading_references 269.8 ms (268.2 ms .. 272.8 ms)
-xall 520.4 ms (515.5 ms .. 523.6 ms)

@jgm
Copy link
Owner Author

jgm commented Aug 22, 2020

One idea to explore: use ShortText from text-short package instead of Text in Tok.
The public API could still use Text.
This should reduce the memory used by the tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant