Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
arxiv_paper_collector.py	arxiv_paper_collector.py
papers_database.json	papers_database.json

Name

Last commit message

Last commit date

arxiv_paper_collector.py

papers_database.json

Research Collection Scripts

Automated tools for collecting and organizing RL + LLM research papers.

Scripts

`arxiv_paper_collector.py`

Automatically fetches recent papers from arXiv on:

Reinforcement Learning for Code Generation
RLHF and LLM Alignment
Program Synthesis with RL
LLM Reasoning with RL
AlphaCode and related work

Usage:

cd scripts
python3 arxiv_paper_collector.py

What it does:

Searches arXiv with targeted queries
Filters papers from 2022-2025
Removes duplicates
Organizes by topic
Generates markdown files with paper details
Saves JSON database for programmatic access

Output:

../Modern-RL-Research/*/PAPERS.md - Organized paper lists
papers_database.json - Complete database

Customization: Edit the SEARCH_QUERIES dictionary in the script to add/modify search terms.

Rate Limiting: The script respects arXiv's rate limits (3 seconds between requests).

Requirements

No external dependencies! Uses only Python standard library:

urllib - For API requests
xml.etree.ElementTree - For XML parsing
json - For data export

Python 3.6+ required.

Tips

Updating the collection:

# Run monthly to keep papers up-to-date
python3 arxiv_paper_collector.py

Filtering results: Adjust start_year and max_results parameters in the script.

Custom searches: Add your own queries to SEARCH_QUERIES dictionary.

JSON output: Use papers_database.json for custom analysis:

import json
with open('papers_database.json') as f:
    papers = json.load(f)

Future Enhancements

Potential additions:

Paper citation counts via Semantic Scholar API
Automatic GitHub repo detection
PDF download and text extraction
Duplicate detection across versions (v1, v2, etc.)
Email notifications for new papers
Integration with paper management tools (Zotero, etc.)

Part of the Study-Reinforcement-Learning repository

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Research Collection Scripts

Scripts

`arxiv_paper_collector.py`

Requirements

Tips

Future Enhancements

FilesExpand file tree

scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

scripts

Folders and files

parent directory

README.md

Research Collection Scripts

Scripts

arxiv_paper_collector.py

Requirements

Tips

Future Enhancements

`arxiv_paper_collector.py`