Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract headings to db #83

Open
superiums opened this issue Dec 4, 2023 · 6 comments
Open

extract headings to db #83

superiums opened this issue Dec 4, 2023 · 6 comments

Comments

@superiums
Copy link

how about to extract headings (marked as #, ... ###### ) to db ?
heading is the main structor of a markdown file.
this helps structer the file and make local knowedege db avaluable.

@rufuspollock
Copy link
Member

@superiums great suggestion. Do you have a specific structure you want e.g. do you want line numbers etc?

@superiums
Copy link
Author

superiums commented Dec 8, 2023

linenumber usually is not the first concern, eg:

# medicine Axxx

> this is for catch cold.

this medicine contains following element:
- element A
- element B
- element xxx

## description: 3times per day.

other text xxxxxxxxxxxxxxx


# medicine Bxxx

> this is for catch hot.

this medicine contains following element:
- element yyy
- element uuu
- element xxx

## description: not eat meat.

other text oooooooooooooooooooooooooooooooooo

yet an other text oooooooooooooooooooooooooooo

...

the document could be serialized to sqlite via folling fields:

  • heading 1 (means medicine name line here)
  • heading 2 (means description line here)
  • quoting (menas usage here)
  • listing (means the structor list here)
  • other text (means the text behind the last heading )

which filed to serialize could be customized by cmd arguments.
and the line numbers seems not so important. if user need to get the linenumber of heading , the sql 'offset 5, limit 1' may work.

@superiums
Copy link
Author

superiums commented Dec 8, 2023

this is effeciency if user need to search sth in specifec place.
in this example, after serialize, user could search the record easily via

  • heading 1
  • heading 2
  • quoting
  • listing

or sth else, for common usage, all specific markd md is able to act like this. main contains:

  • headings
  • quote
  • list
  • links (both markdown links [xxx] (url) and wiki links [[filename#heading]] )
  • tags ( eg. #tagA #tagB )

@superiums
Copy link
Author

i rethink it more, i found only heading serialize is neccessary (at most add a qutoe for description ).
other ones could put to a content field.

maybe like this:
markdowndb --extract-heading 3 means add these fields in db: heading1 heading2 heading3 content
markdowndb --extract-heading 3 --extract-description 2 means : heading1 heading2 heading3 description1 description2 content

the discription is the qutoe line wich following the heading line.

@mohamedsalem401
Copy link
Contributor

@superiums
This can be done using Markdowndb's new computed fields feature.

For example, the following:

function getFirstHeading(fileObject: any, ast: any){
  let firstHeading: any = null;

  // Use unist-util-visit to traverse the AST and find the first heading
  visit(ast, 'heading', (node: HeadingNode) => {
    if (!firstHeading) {
      firstHeading = node;
    }
    // Stop visiting after finding the first heading
    return visit.EXIT;
  });

  // Assign the header property to the founded header
  fileObject.header = firstHeading;
}
  
client.indexFolder("PATH", {computedFields: [getFirstHeading]})

You can add additional fields (or functions) to compute the first nth headings and the first nth descriptions as needed...

@rufuspollock
Copy link
Member

@mohamedsalem401 this would be perfect for a small blogpost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants