Skip to content

Latest commit

 

History

History
21 lines (18 loc) · 1003 Bytes

faiss.ciral-v1.0.mdpr-tied-pft-msmarco.20240212.2154e7.README.md

File metadata and controls

21 lines (18 loc) · 1003 Bytes

CIRAL v1.0 mdpr-tied-pft-msmarco-ft-all Indexes

This index was generated on 2024/02/12 at Pyserini commit 2154e7 on basilisk with the following command:

lang=hausa # or yoruba, swahili, somali
encoder=castorini/mdpr-tied-pft-msmarco

python -m pyserini.encode   input   --corpus $INPUT_DIR/"$lang".jsonl \
                                    --fields title text url \
                                    --delimiter "\n" \
                                    --shard-id 0 \
                                    --shard-num 1 \
                            output  --embeddings $INDEX_DIR \
                                    --to-faiss \
                            encoder --encoder $encoder \
                                    --encoder-class auto \
                                    --fields text \
                                    --batch 32 \
                                    --fp16