TK
HomePortfolioBlogAboutResume

Manuscript Alert

Contributor
Research paper aggregator for Alzheimer's disease and neuroimaging researchers. Stay updated with the latest publications across multiple academic databases.
View on GitHub
Manuscript Alert System showing paper search results with relevance scoring
What It Does

Aggregates papers from PubMed, arXiv, bioRxiv, medRxiv with smart keyword matching and relevance scoring

Data Sources

4 APIs

Keyword Match

2+ required

Scoring

Smart ranking

Current Stack

Python + Streamlit

Architecture
┌─────────────────────────────────────────────────────────────────┐
│                      STREAMLIT APP                              │
│  ┌───────────────┐  ┌───────────────┐  ┌──────────────────┐    │
│  │   Papers Tab  │  │  Models Tab   │  │   Settings Tab   │    │
│  │   (Search)    │  │  (Future RAG) │  │   (Keywords)     │    │
│  └───────────────┘  └───────────────┘  └──────────────────┘    │
└─────────────────────────────────────────────────────────────────┘
                              │
         ┌────────────────────┼────────────────────┐
         ▼                    ▼                    ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│  PubMed Fetcher │  │  arXiv Fetcher  │  │ bioRxiv Fetcher │
│                 │  │                 │  │ (+ medRxiv)     │
│  NCBI E-utils   │  │  arXiv API      │  │  RSS Feeds      │
└─────────────────┘  └─────────────────┘  └─────────────────┘
         │                    │                    │
         └────────────────────┼────────────────────┘
                              ▼
                    ┌─────────────────┐
                    │ Keyword Matcher │
                    │                 │
                    │ - 2+ match rule │
                    │ - Relevance     │
                    │   scoring       │
                    └─────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │  Ranked Papers  │
                    │  (Streamlit UI) │
                    └─────────────────┘
Key Features
Multi-Source Fetching

Concurrent API calls to PubMed, arXiv, bioRxiv, and medRxiv for comprehensive coverage.

Performance: ThreadPoolExecutor enables parallel fetching, reducing wait time from minutes to seconds.
Smart Keyword Matching

Papers must match at least 2 keywords to be displayed, reducing noise significantly.

Algorithm: Compiled regex patterns for case-insensitive matching with relevance scoring based on match count.
Relevance Scoring

Papers ranked by relevance to research interests, with scores visible in the UI.

Factors: Keyword frequency, journal quality, recency, and title/abstract weight.
Journal Quality Filter

Option to filter for high-impact journals only (Nature, JAMA, Brain, Radiology, etc.).

Use case: When you want only peer-reviewed, high-quality publications.
v2 Roadmap
Planning

Planning a complete rewrite with modern web stack and AI capabilities:

React + Django Architecture

Replace Streamlit with React frontend and Django REST backend for better UX, scalability, and deployment flexibility.

RAG Integration

Semantic similarity scoring using embeddings—move beyond keyword matching to intelligent manuscript relevance assessment.

Project-Specific Knowledge Bases

Enable researchers to create domain-specific knowledge bases that improve over time with accumulated papers.

Planned Tech Stack:
React
Django
PostgreSQL
RAG Pipeline
Semantic Search
Current Tech Stack (v1)
Python
Streamlit
pandas
requests
BeautifulSoup
feedparser
concurrent.futures
PubMed API
arXiv API
bioRxiv API
Learn More