ENZH

You Don't Need a Bloomberg Terminal: Building Your Financial Data Pipeline in Python

Previous lesson covered the three financial statements, nine key metrics, and how to compare companies within the same sector. You now know what to read. This lesson is about where the data comes from, how to get it programmatically, and how to build a system that keeps it flowing.

A Bloomberg Terminal costs $20,000 per year. It gives you real-time data on every asset class across every global exchange, chat with other finance professionals, and a proprietary analytics suite. Over 325,000 terminals are active worldwide — nearly all of them inside hedge funds, investment banks, and asset management firms.

pip install yfinance costs $0. It gives you 132 financial attributes per ticker, 5 years of daily price history, income statements, balance sheets, and cash flow — all as pandas DataFrames. For a retail investor building a personal research pipeline, this covers 90% of what Bloomberg does.

The remaining 10% — real-time streaming, institutional-grade order flow, global bond markets — is what the $20K buys. But for learning and personal investing, you do not need it.


Core arguments of this lesson:

One: Financial data has three dimensions — fundamental, technical, and nonfinancial — and each answers a different question. Fundamental data (quarterly financials) tells you if a company is worth buying. Technical data (daily prices) tells you when to buy. Nonfinancial data (sentiment, VIX, insider flow) tells you what the crowd thinks and what is not in the numbers. You need all three.

Two: The platform landscape has four tiers, and you should start free. Free sources (Yahoo Finance, yfinance) cover personal research. Advisory platforms ($10-350/yr) add curated analysis. Fintech APIs ($0-1200/yr) give you programmatic access and unique data like sentiment scores. Enterprise ($20K+) is for institutions. Move up only when you hit a specific limitation.

Three: yfinance is your starting SDK — powerful, free, and sufficient for years. 132 attributes per ticker, three financial statements as DataFrames, historical prices, and dividend data. Learn its capabilities and limitations before spending money on commercial APIs.

Four: A data pipeline is not a one-time script — it is a continuous monitoring system. Your investment thesis can break at any time. Google Trends tracks public attention shifts. Alpha Vantage NEWS_SENTIMENT scores quantify media mood. Combining price data with nonfinancial signals creates an early warning system for thesis invalidation.


1. Three Data Types: Your Three Information Dimensions

Every investment decision depends on three kinds of data. Think of them as three monitoring layers for a production system:

Data TypeFrequencyExamplesQuestion It Answers
FundamentalQuarterly / AnnualRevenue, EPS, D/E, P/E, cash flowIs this company worth buying?
TechnicalReal-time / DailyPrice, volume, moving averages, RSIWhen should I buy or sell?
NonfinancialVariesNews sentiment, social media, ESG, insider tradesWhat does the crowd think? What risks are not in the numbers?

The programmer analogy maps cleanly:

  • Fundamental data = unit tests. They verify correctness — is the business fundamentally sound? You run them quarterly (earnings reports) and they give you a pass/fail on financial health. High latency, high signal.
  • Technical data = monitoring and alerting. Real-time dashboards tracking price, volume, and momentum indicators. Like Datadog or Grafana — they tell you what the system is doing right now, not whether the architecture is correct. Low latency, noisier signal.
  • Nonfinancial data = user feedback and support tickets. Sentiment analysis, Google Trends, insider transactions — qualitative signals that reveal what the numbers cannot. A company can have perfect financials while users are flooding Reddit with complaints about the product. This is the context layer.

No single layer is sufficient. A company with great fundamentals can still drop 30% on a sentiment shift (strong unit tests, but users are leaving). A stock with perfect technical momentum can collapse when quarterly earnings miss (monitoring looks great, but the code has a critical bug). The three layers together give you a complete picture.

2. The Platform Landscape: From Free to Bloomberg

Data platforms for investors fall into four tiers. The cost curve is steep, and each tier has a specific sweet spot.

TierCostExamplesBest ForLimitation
Free$0Yahoo Finance, Google FinanceQuick lookups, basic screening, beginnersNo API (officially), data can be delayed
Advisory$10-350/yrSeeking Alpha ($240), Motley Fool ($99-349)Curated analysis, community insights, stock picksOpinion-driven, must verify independently
Fintech APIs$0-1200/yrEODHD ($240-1200), Alpha Vantage (free-paid), OpenBB (free)Programmatic access, automation, unique data endpointsRate limits on free tiers, coverage varies
Enterprise$20K+/yrBloomberg Terminal, RefinitivInstitutional — real-time, global, everythingIf you need to ask the price, you cannot afford it

Decision framework

Start with yfinance (free tier). It wraps Yahoo Finance data into Python objects. Sufficient for learning, backtesting, and personal portfolio analysis.

Move to Fintech APIs when you hit a specific wall:

  • Need international exchange coverage? EODHD covers 77 exchanges.
  • Need sentiment data? Alpha Vantage has a unique NEWS_SENTIMENT endpoint.
  • Want one interface across multiple providers? OpenBB lets you swap data sources with a single parameter change.

Advisory platforms are for reading, not building. Seeking Alpha's community articles and quant ratings save time on qualitative research, but they do not give you a data pipeline.

Enterprise is for when you manage other people's money. Not before.

3. Jupyter Notebooks: Your Research IDE

Before we pull data, set up the environment. Jupyter Notebooks are the standard research IDE for financial analysis — interactive Python with code, output, and markdown in a single document.

The workflow:

  • One notebook per research question. "Is NVDA undervalued?" gets its own notebook. "Compare retail sector liquidity" gets another. Do not dump everything into one giant file.
  • Run cells incrementally. Pull data in cell 1, clean it in cell 2, analyze in cell 3. Inspect the DataFrame at each step. This is REPL-driven development — exactly how you would debug in a Python shell, but persistent and shareable.
  • Export to HTML or PDF for sharing findings with non-technical collaborators.

Google Colab = free hosted Jupyter. No local setup needed — open a browser, start writing Python. GPU available if you get into ML-based analysis later. All the code in this lesson runs in Colab.

4. yfinance Deep Dive: 132 Free Attributes Per Ticker

yfinance is the requests library of financial data — not the most robust, not the most feature-rich, but the one you reach for first because it just works.

Core Data Access

import yfinance as yf

ticker = yf.Ticker("NVDA")

# Financial statements (annual) — returns pandas DataFrames
income_stmt = ticker.income_stmt        # Revenue, COGS, Net Income...
balance_sheet = ticker.balance_sheet     # Assets, Liabilities, Equity
cash_flow = ticker.cash_flow            # Operating, Investing, Financing

# Ticker info — dict with ~132 attributes
info = ticker.info
# Includes: marketCap, trailingPE, forwardPE, pegRatio, dividendYield,
#           debtToEquity, currentRatio, quickRatio, returnOnEquity,
#           profitMargins, sector, industry, fullTimeEmployees, etc.

# Historical prices — OHLCV DataFrame
hist = ticker.history(period="5y")
# Columns: Open, High, Low, Close, Volume, Dividends, Stock Splits

Each statement comes as a DataFrame where rows are field names (Total Revenue, Net Income, Total Debt, etc.) and columns are reporting periods. Ready for time series analysis out of the box. We used these in Lesson 2 to compute metrics manually — this lesson focuses on building the pipeline around them.

collect_ratios(): The Quick-Scan Function

The .info dictionary gives you roughly 132 pre-computed attributes. Instead of pulling one at a time, grab everything you need in a single call:

def collect_ratios(symbol):
    """Collect key financial metrics for a single stock."""
    t = yf.Ticker(symbol)
    info = t.info
    return {
        'P/E (TTM)': info.get('trailingPE'),
        'P/E (FWD)': info.get('forwardPE'),
        'PEG': info.get('pegRatio'),
        'P/S': info.get('priceToSalesTrailing12Months'),
        'P/B': info.get('priceToBook'),
        'ROA': info.get('returnOnAssets'),
        'ROE': info.get('returnOnEquity'),
        'Profit Margin': info.get('profitMargins'),
        'Current Ratio': info.get('currentRatio'),
        'Quick Ratio': info.get('quickRatio'),
        'D/E': info.get('debtToEquity'),
        'Dividend Yield': info.get('dividendYield'),
        'Payout Ratio': info.get('payoutRatio'),
        'EPS (TTM)': info.get('trailingEps'),
        'Market Cap': info.get('marketCap'),
    }

This is your quick-scan function — run it on any ticker to get a full health check in one call. Feed results into a pandas DataFrame for multi-company comparison:

import pandas as pd

tickers = ['NVDA', 'AAPL', 'KO', 'WMT', 'MSFT']
df = pd.DataFrame({t: collect_ratios(t) for t in tickers})
print(df.to_string())

Manual Ratio Computation from Raw DataFrames

Sometimes .info numbers are delayed, missing, or use a definition you do not agree with. Computing directly from the three statement DataFrames gives you full control:

import yfinance as yf

t = yf.Ticker("NVDA")
bs = t.balance_sheet
is_ = t.income_stmt
cf = t.cash_flow

# Liquidity
current_ratio = bs.loc['Current Assets'].iloc[0] / bs.loc['Current Liabilities'].iloc[0]
quick_ratio = (bs.loc['Current Assets'].iloc[0] - bs.loc['Inventory'].iloc[0]) / bs.loc['Current Liabilities'].iloc[0]

# Debt
de_ratio = bs.loc['Total Debt'].iloc[0] / bs.loc['Stockholders Equity'].iloc[0]
interest_coverage = is_.loc['EBIT'].iloc[0] / abs(is_.loc['Interest Expense'].iloc[0])

# Earnings
eps = is_.loc['Net Income'].iloc[0] / t.info['sharesOutstanding']
fcf = cf.loc['Operating Cash Flow'].iloc[0] - cf.loc['Capital Expenditure'].iloc[0]
fcf_per_share = fcf / t.info['sharesOutstanding']

# Profitability
roa = is_.loc['Net Income'].iloc[0] / bs.loc['Total Assets'].iloc[0]
roe = is_.loc['Net Income'].iloc[0] / bs.loc['Stockholders Equity'].iloc[0]
profit_margin = is_.loc['Net Income'].iloc[0] / is_.loc['Total Revenue'].iloc[0]

The advantage: you know exactly which numerator and denominator went into every ratio. .info's debtToEquity might define "debt" differently from what you intend. Manual calculation removes ambiguity.

Ticker Naming Conventions

yfinance supports international exchanges using suffix notation:

ExchangeFormatExample
US (NYSE/NASDAQ)Plain symbolNVDA, AAPL
LondonSymbol.LRR.L (Rolls-Royce)
GermanySymbol.DEALV.DE (Allianz)
TokyoSymbol.T7203.T (Toyota)
Hong KongSymbol.HK0700.HK (Tencent)

If you are analyzing non-US stocks, always append the exchange suffix. yf.Ticker("7203.T").history(period="1y") pulls Toyota's data from the Tokyo Stock Exchange.

Simple Returns vs Log Returns

Two ways to calculate returns from price data. Which one you use depends on what you are doing next.

Simple returns — the intuitive percentage change:

hist = yf.Ticker("NVDA").history(period="1y")
simple_returns = hist['Close'].pct_change()  # (P1 - P0) / P0

Log returns — the one quants actually use:

import numpy as np
log_returns = np.log(hist['Close'] / hist['Close'].shift(1))

Why log returns matter:

  1. Additive across time. Sum daily log returns to get monthly log returns. Simple returns do not have this property — you must compound them multiplicatively. Additive operations are simpler and more numerically stable.
  2. Symmetric. A +10% log return and a -10% log return have equal magnitude. Simple returns are asymmetric: +10% followed by -10% does not return you to zero.
  3. Better for statistical modeling. Log returns are closer to a normal distribution, which matters for any model that assumes normality (most of classical finance).

For quick analysis, simple returns are fine. For anything involving statistics, time aggregation, or modeling — use log returns.

Descriptive Statistics and Volatility

returns_stats = simple_returns.agg(['mean', 'std', 'var'])
# mean = average daily return
# std  = volatility (standard deviation)
# var  = variance

Standard deviation of returns is the primary measure of risk in finance. Compare volatility across stocks to understand relative risk:

CompanyStd Dev (2023)Interpretation
NuScale (SMR)0.799Extremely volatile — nuclear startup
NVDA~0.35High volatility — AI hype cycle
KO0.135Low volatility — defensive consumer staple

NuScale (std 0.799) is nearly 6x more volatile than Coca-Cola (std 0.135). In programmer terms: NuScale is the startup that deploys to production three times a day and occasionally takes the site down. Coca-Cola is the legacy monolith that ships quarterly and never crashes.

Higher volatility means wider daily price swings. For a long-term investor, volatility creates buying opportunities. For a short-term trader, volatility is both risk and opportunity. Your time horizon determines whether volatility is a feature or a bug.

yfinance Limitations

Know the tool's constraints before relying on it:

  • Scraping-based. yfinance pulls data from Yahoo Finance's website. When Yahoo changes their HTML or internal API structure, yfinance can break. This has happened multiple times.
  • Rate limiting. Too many requests in a short window will get you temporarily blocked.
  • Data accuracy not guaranteed. Yahoo Finance is not the authoritative source for financial data. They aggregate from multiple providers, and errors happen.
  • No real-time streaming. Data is always slightly delayed. Not suitable for day trading or high-frequency strategies.
  • Community-maintained. This is not an official Yahoo product. It depends on volunteer contributors keeping up with Yahoo's changes.

For personal research and learning, these limitations rarely matter. For production systems or automated trading, you need a paid API with an SLA.

5. Commercial Data Sources: When yfinance Is Not Enough

Four platforms worth knowing about, each with a distinct strength.

Finviz: The Screener

US stocks only. $24.96/month for Elite. 70+ screening filters (P/E, market cap, sector, dividend yield, technical patterns). Visual stock charts with pattern annotations and heat maps.

Best for: screening the US market quickly. "Show me all tech stocks with PEG under 1.0 and ROE above 15%" — Finviz does this in seconds with its web UI. Elite subscribers get REST API access and CSV export.

Limitation: US-only. No international exchanges.

EODHD (End of Day Historical Data)

77 exchanges worldwide. Freemium — free tier gives 20 API calls per day. Clean REST API returning JSON.

import requests

API_KEY = "your_api_key"
url = f"https://eodhd.com/api/eod/AAPL.US?api_token={API_KEY}&fmt=json"
data = requests.get(url).json()

Best for: international coverage. If you are analyzing stocks on the London, Frankfurt, Tokyo, or Hong Kong exchanges, EODHD has the data. EOD prices, fundamentals, dividends, splits, and options.

Alpha Vantage: The Sentiment API

Free tier gives 25 calls per day. Simple REST API with good documentation. The unique feature is the NEWS_SENTIMENT endpoint — no other free API offers this.

from alpha_vantage.timeseries import TimeSeries

ts = TimeSeries(key='YOUR_API_KEY', output_format='pandas')
data, meta = ts.get_daily(symbol='AAPL', outputsize='full')

The NEWS_SENTIMENT endpoint returns sentiment scores (-1.0 to 1.0) for news articles mentioning a specific ticker. More on this in Section 8.

OpenBB: The Adapter Pattern

Open source. Multi-asset: stocks, bonds, crypto, forex. The key architectural insight: one interface, multiple backends.

from openbb import obb

# Same function call, different data providers — swap with one parameter
data_yf = obb.equity.price.historical("AAPL", provider="yfinance")
data_av = obb.equity.price.historical("AAPL", provider="alpha_vantage")
data_eodhd = obb.equity.price.historical("AAPL", provider="eodhd")

Programmers will recognize this immediately — it is the adapter pattern. OpenBB abstracts away provider-specific API differences behind a unified interface. You can compare data quality across sources with a single parameter change.

Comparison Table

FeatureFinvizEODHDAlpha VantageOpenBB
Cost$24.96/mo$0-1200/yrFree (25/day) + paidFree (open source)
CoverageUS only77 exchangesUS + major intlMulti-asset
Unique strengthScreening UIInternational breadthNEWS_SENTIMENTProvider abstraction
API styleREST (Elite)REST/JSONREST/pandasPython SDK
Best forUS screeningGlobal equitiesSentiment analysisMulti-source pipeline

6. Library Evaluation: Do Not Be Fooled by Star Count

When choosing a financial data library or API, GitHub stars are the least important signal. A library with 10K stars and no commits in 18 months will break before a library with 500 stars and weekly releases.

Eight criteria that actually matter:

CriterionWhat to CheckWhy It Matters
RecencyWhen was the last release? Last commit?Financial data APIs change constantly. Abandoned libraries break fast.
CommunityGitHub contributors, issues response time, Stack Overflow activityA solo-maintainer project is one burnout away from abandonment.
Cross-platformWindows/Mac/Linux supportIf you develop on Mac and deploy on Linux, this matters.
Exchange coverageUS-only vs international? How many exchanges?Your needs may expand beyond US equities.
DocumentationEnglish docs? Working code examples? API reference quality?Bad docs mean hours of trial-and-error.
Data freshnessReal-time vs EOD vs 15-min delayed?Depends on your strategy — long-term investors need EOD; day traders need real-time.
ReliabilityOfficial API vs scraping? Rate limits? Uptime history?Scraping-based tools break without warning. Official APIs have SLAs.
CostFree tier limitations? How does pricing scale?Some APIs charge per call, others per month. Model your usage first.

Decision framework by use case

  • Personal research, learning → yfinance. Free, sufficient, well-documented.
  • International coverage → EODHD or OpenBB.
  • Sentiment analysis → Alpha Vantage (NEWS_SENTIMENT).
  • US stock screening → Finviz.
  • Production systems → paid APIs with SLAs (EODHD paid tier, Bloomberg for institutions).

7. Nonfinancial Data: Signals Beyond the Numbers

Financial statements and price data are backward-looking. By the time revenue shows up in a quarterly report, the trend that drove it has been underway for months. Nonfinancial data can give you earlier signals.

VIX: The Market's Fear Gauge

The CBOE Volatility Index (VIX) measures expected 30-day volatility of the S&P 500, derived from options prices. Traders call it the "Fear Index."

VIX RangeMarket MoodWhat to Watch
< 15Complacent — low fearHistorically, complacency precedes corrections. Not a buy signal.
15-25NormalStandard operating range.
25-35Elevated fearVolatility rising. Hedging costs increasing. Potential opportunity forming.
> 35Extreme fear / panicHistorically, the best buying opportunities occur here — if you can stomach the drawdown.

VIX above 35 accompanied Black Monday (1987), the Lehman Brothers collapse (2008), and COVID (March 2020). In every case, buying during the panic and holding for 12+ months produced exceptional returns. The catch: it requires conviction to buy when everyone around you is selling.

Sentiment Data

Alpha Vantage's NEWS_SENTIMENT endpoint scores news articles from -1.0 (extremely bearish) to +1.0 (extremely bullish) for specific tickers. Tracking sentiment over time can surface:

  • Sentiment divergence from price. Price rising but sentiment declining? The market may be running on momentum rather than conviction. Watch for a reversal.
  • Sentiment shock. A sudden drop in sentiment after a specific event (earnings miss, lawsuit, management departure) quantifies how severe the market perceives the impact.

Movement and Flow Data

  • Options flow — large unusual options activity can signal institutional positioning. A sudden spike in call volume on a stock before an announcement suggests someone knows something.
  • Dark pool prints — trades executed off-exchange, typically by institutions trying to move large blocks without moving the price. Dark pool volume as a percentage of total volume can indicate institutional accumulation or distribution.

Demographic and Economic Data

Census data, employment figures, consumer confidence indices — these are macro-level signals that affect entire sectors. Rising home construction permits? Building materials companies benefit. Aging population demographics? Healthcare spending increases.

Enterprise Sources

Bloomberg, Refinitiv, and FactSet aggregate all of the above into unified platforms. They charge $20K+ per year because they save institutional investors the time of stitching together dozens of free and semi-free sources. For a retail investor, stitching together yfinance + Alpha Vantage + public economic data gets you 80% of the way there.

8. Continuous Monitoring: Investing Is Not a One-Time Query

In Lesson 1, we established that programmers can automate what others do manually. This section is where that advantage becomes concrete.

An investment thesis is a hypothesis. Like any hypothesis, it can be invalidated by new data. You need a monitoring system — not a one-time analysis.

from pytrends.request import TrendReq

pytrends = TrendReq()
pytrends.build_payload(["LiDAR", "autonomous driving"], timeframe='today 12-m')
interest = pytrends.interest_over_time()

Google Trends measures relative search interest over time — a proxy for public attention. Use cases:

  • Hype cycle detection. If search interest for "LiDAR" spikes 5x in a month, the sector may be entering a hype phase. Prices tend to overshoot during hype and correct afterward.
  • Trend confirmation. Your thesis says autonomous driving adoption is accelerating. If Google Trends for "autonomous driving" has been flat or declining for 12 months, the thesis may be stale.
  • Competitive monitoring. Compare search interest across competing technologies or companies to detect mindshare shifts.

Alpha Vantage NEWS_SENTIMENT: Quantifying Media Mood

import requests

url = "https://www.alphavantage.co/query"
params = {
    "function": "NEWS_SENTIMENT",
    "tickers": "LAZR,INVZ",
    "apikey": API_KEY,
}
response = requests.get(url, params=params).json()
# Returns sentiment scores per article, per ticker

The response includes individual article scores, relevance scores per ticker, and publication timestamps. You can:

  • Track sentiment over time for a specific company. Build a rolling 30-day average sentiment score. Compare it against price movement.
  • Monitor sentiment around events. Earnings announcements, product launches, executive departures — how did sentiment shift?
  • Detect thesis-breaking signals. Your thesis on a LiDAR company depends on continued industry optimism. If LAZR sentiment has been trending negative for 3 months while price is flat, the market may be ahead of the sentiment shift.

Use Case: Monitoring Thesis Health

Combine these tools into a thesis health monitor:

  1. Google Trends — is public interest in your thesis sector growing, flat, or declining?
  2. NEWS_SENTIMENT — is media coverage of your specific company positive, neutral, or negative? Is it trending?
  3. Price + volume — is the stock confirming or diverging from sentiment?

Declining sentiment + declining price = thesis is breaking. Time to re-evaluate. Declining sentiment + rising price = momentum trade, not conviction. Fragile. Rising sentiment + rising price = thesis is confirmed. Hold or add. Rising sentiment + declining price = market disagrees with the crowd. Investigate why.

This is the investing equivalent of a CI/CD dashboard. Green does not mean everything is perfect — it means your tests are still passing. Red does not mean the project is dead — it means you need to investigate.

9. Putting It Together: The Pipeline Architecture

A complete financial data pipeline mirrors any data engineering system:

Collection → Storage → Analysis → Signals

Collection Layer

Pull data from multiple sources on a schedule:

  • yfinance: daily prices, quarterly financials, ticker info (free)
  • Alpha Vantage: news sentiment scores (free tier: 25 calls/day)
  • pytrends: Google Trends interest data (free, rate-limited)
  • FRED (Federal Reserve Economic Data): macro indicators — VIX, interest rates, unemployment (free API)

Storage Layer

For personal research, CSV files or a SQLite database are sufficient. Each data type gets its own table:

TableUpdate FrequencySource
daily_pricesDaily (after market close)yfinance
financialsQuarterly (after earnings)yfinance
sentimentDaily or event-drivenAlpha Vantage
trendsWeeklypytrends
macroMonthlyFRED

Do not over-engineer the storage layer. A flat CSV-per-ticker approach works fine for portfolios under 100 stocks. Move to SQLite or Postgres when you need joins across dimensions.

Analysis Layer

Combine data types to answer investment questions:

  • Fundamental screen: filter stocks by P/E, PEG, ROE, D/E
  • Technical overlay: add moving averages, RSI, Bollinger Bands to screened candidates
  • Sentiment check: compare price trends against NEWS_SENTIMENT and Google Trends

Signal Layer

Generate alerts when conditions are met:

  • Interest Coverage drops below 2.0 → liquidity warning
  • Sentiment score crosses below -0.3 for 5 consecutive days → thesis check
  • Price crosses below 200-day SMA → technical weakness
  • VIX crosses above 30 → market-wide fear, potential opportunity

Your Free Stack

ComponentToolCost
Financial datayfinance$0
SentimentAlpha Vantage (free tier)$0
Trendspytrends$0
Macro dataFRED API$0
Research IDEGoogle Colab$0
ScreeningFinviz (free tier)$0
StorageCSV / SQLite$0

Total cost: $0. Total capability: enough to do serious fundamental, technical, and nonfinancial analysis on any public company in the US market, with limited international coverage.

Hands-On Exercises

Open the Colab notebook and run through six exercises covering every tool from this lesson:

→ Open Lesson 3 Colab Exercises

  1. Batch pull 10 companies with collect_ratios()
  2. Manual ratio computation from raw DataFrames
  3. Simple vs Log Returns + volatility comparison
  4. VIX Fear Index 5-year analysis
  5. Google Trends search interest tracking
  6. Global ticker pulls (London, Frankfurt, Tokyo, Hong Kong)

What Comes Next

This pipeline collects and stores data. The next lessons in this series will use it to:

  • Build an investment thesis from scratch (growth portfolio construction)
  • Evaluate dividend sustainability (income portfolio construction)
  • Backtest trading strategies (SMA crossover, multi-factor scoring)
  • Build interactive dashboards (Streamlit + Plotly visualization)

The data pipeline is the foundation. Everything that follows — analysis, strategy, execution — reads from it.

Closing

Bloomberg Terminal users have a $20K/year advantage in data breadth, speed, and integration. They do not have an advantage in data understanding.

The free tools covered in this lesson — yfinance, Alpha Vantage, pytrends, Finviz — give you access to the same fundamental data, the same price history, and increasingly similar alternative data. What separates a useful pipeline from a useless one is not the data source. It is whether the person using it knows which questions to ask.

A data pipeline with no thesis is a data lake. A thesis with no data pipeline is a guess.

Build both.


This series is based on Stefan Papp's Investing for Programmers — all companion code is open source.

Previous: Reading the Source Code of Companies | Next: Building a Growth Portfolio — From Macro Thesis to Stock Picks


© Xingfan Xia 2024 - 2026 · CC BY-NC 4.0