Youtube Comment Analysis Pipeline

Frontend Implementation

Backend Workflow

Initial Channel Scraping (Every ~6 hours via crontab)

graph TD

A[Crontab] --> B[index.js]

B --> C[Puppeteer Browser]

C --> D[YouTube Channel Pages]

D --> E[Video Data]

E --> F[temp/*_videos.json]

Video Processing Stream (Continuous)

graph TD

A[process_stream.py] --> B[Watchdog Observer]

B --> C[VideoDataHandler]

C --> D[Detect new *_videos.json]

D --> E[Filter target_keyword-related videos]

E --> F[Update master_videos.csv]

E --> G[video_data MongoDB ]

E --> H[Trigger comment scrape]

Comment Scraping Flow

graph TD

A[process_stream.py] --> B[scrape_comments function]

B --> C[comment_scrape.js]

C --> D[Puppeteer with MITM Proxy]

D --> E[xhr_scrape_ds.py intercepts]

E --> F[Process XHR responses]

F --> G[Save to CSV]

G --> H[Move to data/channel_id/video_id/]

Analysis Pipeline

graph TD

A[analysis.py] --> B[Load all data]

B --> C[Join with channel-to-state mapping]

C --> D[CommentAnalyzer]

D --> E[Keyword analysis]

D --> F[Sentiment analysis]

D --> G[Engagement metrics]

E & F & G --> H[MongoDB collections]

Data Collection:

index.js scrapes channel video listings periodically
process_stream.py watches for new video data and manages the pipeline
comment_scrape.js + xhr_scrape_ds.py handle comment collection

Data Processing:

Videos are filtered for election-related content
Comments are processed and organized by channel/video
Geographic attribution is maintained throughout

Analysis:

analysis.py aggregates all data
comment_analysis.py provides specialized content analysis
Results are stored in MongoDB for the frontend to access

MongoDB Collections:

video_data: Raw video information
comments_with_video: Processed comments with video context
state_analysis: State-level aggregated metrics
video_analysis: Video-level analysis results

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
transcript_scrape		transcript_scrape
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analysis.py		analysis.py
channels.txt		channels.txt
comment_analysis.py		comment_analysis.py
comment_scrape.js		comment_scrape.js
config.json		config.json
index.js		index.js
landing-page.png		landing-page.png
master_channel_to_states.csv		master_channel_to_states.csv
mongo_check.py		mongo_check.py
mongo_setup.py		mongo_setup.py
package.json		package.json
process_output.py		process_output.py
process_stream.py		process_stream.py
start_comment_scraper.sh		start_comment_scraper.sh
walker.py		walker.py
xhr_scrape_ds.py		xhr_scrape_ds.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Youtube Comment Analysis Pipeline

Frontend Implementation

Backend Workflow

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jarrettdev/Youtube-Comment-Analysis-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Youtube Comment Analysis Pipeline

Frontend Implementation

Backend Workflow

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages