Sources & Methodology

Data provenance for every statistic on the dashboard
← Back to dashboard

Summary

This page documents every figure and chart that appears on the dashboard, where the underlying data comes from, and exactly how it's derived. Items fall into four kinds:

1. Complaint Volume by Borough — 2025 Verified

Borough2025 Street Light Complaints
Brooklyn8,501
Queens8,255
Bronx6,751
Manhattan6,199
Staten Island3,974
Unspecified155
NYC Total33,835

Source: NYC OpenData — 311 Service Requests from 2010 to Present (dataset erm2-nwe9), maintained by NYC 311 and updated daily.

Methodology: Socrata SoQL query, run server-side via the Cloudflare Worker proxy at nyc-streetlight-proxy.nddivecha.workers.dev:

SELECT borough, count(*)
WHERE complaint_type = 'Street Light Condition'
  AND created_date BETWEEN '2025-01-01T00:00:00' AND '2026-01-01T00:00:00'
GROUP BY borough

Volume Trend · 2020–2025 Verified

The line graph on the analytics page shows the same query above, run for each year from 2020 through 2025. The Street Light Condition complaint type starts in 2020 in the Socrata dataset (zero records pre-2020), so 2020–2025 is the full available span. The 2023 drop visible across every borough corresponds to a citywide methodology change at NYC DOT, not a real reduction in outages.

Borough202020212022202320242025
Bronx11,59715,50217,2036,9036,1846,751
Brooklyn12,48911,75610,9119,5128,6398,501
Manhattan5,62510,97110,7926,6394,9626,199
Queens14,39312,04213,2589,2228,5718,255
Staten Island4,8524,1764,6842,1032,3183,974

2. Resolution-Speed Percentages — 2025 Verified

Powers the "% closed within 7 / 30 / 90 days" progress bars and the "30-day closure rate by borough" chart on the analytics tab.

BoroughClosed < 7 daysClosed ≤ 30 daysClosed ≤ 90 days
Staten Island23.4%87.7%95.8%
Bronx50.8%83.0%89.1%
Queens51.1%80.8%87.8%
Brooklyn34.3%80.6%89.1%
Manhattan58.9%67.0%80.6%

Three SoQL queries against erm2-nwe9 per borough — one for each window:

SELECT count(*)
WHERE complaint_type = 'Street Light Condition'
  AND borough = '<BOROUGH>'
  AND closed_date IS NOT NULL
  AND date_diff_d(closed_date, created_date) <= <N>
  AND created_date BETWEEN '2025-01-01T00:00:00' AND '2026-01-01T00:00:00'

Then divided by the total filed in that borough/year (denominator from §1).

Key finding: Manhattan continues to have the worst 30-day closure rate (~67%, meaning 1 in 3 Manhattan complaints sit unresolved a month after filing) despite leading on same-week response. This pattern has held across multiple years and is the headline statistic on the landing page.

3. Median Days to Close Verified

One Socrata query per borough plus a citywide query, all 2025:

SELECT median(date_diff_d(closed_date, created_date)) AS m
WHERE complaint_type = 'Street Light Condition'
  AND closed_date IS NOT NULL
  AND borough = '<BOROUGH>'
  AND created_date BETWEEN '2025-01-01' AND '2026-01-01'
BoroughMedian days to close
Manhattan4d
Queens6d
Bronx7d
Brooklyn9d
Staten Island9d
Citywide~7d

4. Descriptor Mix Verified

Powers the "Complaint descriptor mix" horizontal bar chart. Top 20 problem descriptors per borough, summed for 2025; the chart shows the top 5 plus "Other".

SELECT descriptor, count(*)
WHERE complaint_type = 'Street Light Condition'
  AND borough = '<BOROUGH>'
  AND created_date BETWEEN '2025-01-01' AND '2026-01-01'
GROUP BY descriptor
ORDER BY count(*) DESC
LIMIT 20

5. Distribution of Close Times Verified

Bins the close-time of every closed 2025 complaint into the 7 buckets shown on the histogram (0–3, 4–7, 8–14, 15–21, 22–30, 31–60, 61+ days).

SELECT
  CASE
    WHEN date_diff_d(closed_date,created_date) <= 3  THEN '0-3'
    WHEN date_diff_d(closed_date,created_date) <= 7  THEN '4-7'
    WHEN date_diff_d(closed_date,created_date) <= 14 THEN '8-14'
    WHEN date_diff_d(closed_date,created_date) <= 21 THEN '15-21'
    WHEN date_diff_d(closed_date,created_date) <= 30 THEN '22-30'
    WHEN date_diff_d(closed_date,created_date) <= 60 THEN '31-60'
    ELSE '61+'
  END AS bucket,
  count(*)
WHERE complaint_type = 'Street Light Condition'
  AND borough = '<BOROUGH>'
  AND closed_date IS NOT NULL
  AND created_date BETWEEN '2025-01-01' AND '2026-01-01'
GROUP BY bucket

Estimated Completion Time (Box-and-Whisker) Forecast

The boxplot is derived from the histogram above — we don't run a separate query for individual close-day percentiles. For each percentile (Q1=25%, median=50%, Q3=75%, p95=95%) we walk the cumulative count across bins and linearly interpolate within the bin where the percentile falls, using the bin's day range as the interpolation interval (e.g. 8–14 days for the third bin). Without interpolation, adjacent percentiles that land in the same bin would collapse to the same midpoint value.

Worked example (Brooklyn, 2025): with histogram [1202, 1713, 3695, 61, 184, 431, 714] and total = 8,000, the 50% target (4,000) falls inside the 8–14d bin at fraction 0.29, giving median ≈ 9.8d; the 75% target (6,000) is also in the 8–14d bin but at fraction 0.83, giving Q3 ≈ 13.0d. Distinct values, same bin.

6. Vendor List Verified

The 5 closest vendors shown on the map for each complaint, plus the sidebar list, are pulled from a baked subset of the NYC Department of Buildings master electrician license roster. The roster is published publicly and the licenses themselves are real.

Filtering: the raw DOB roster includes ~1,875 licensed electricians, but many of those are institutions that hold a license only for in-house maintenance (universities, hospitals, NYCHA, MTA, NYPD, city agencies, building-management companies, etc.). The dashboard filters those out via a regex pattern over the business-name field, leaving 1,242 active licensed contractors as candidate vendors.

For each submitted complaint the dashboard:

  1. Geocodes the address via Google Maps Geocoding API to get a lat/lng.
  2. Computes a haversine distance from each filtered vendor's lat/lng to the complaint location.
  3. Returns the 5 nearest vendors, ranked 1 (closest) through 5.

Each vendor card displays only fields backed by the DOB roster: business name, license number, phone, distance, plus a Google search link to help find the vendor's actual website. Vendor star ratings and review counts have been deliberately omitted from this dashboard — we don't have a verified source for that data, so we don't show it.

7. Map — Routes & Geocoding Verified

8. Similar Past Complaints Verified

The "Similar resolved complaints" panel on the map page surfaces records from a 10,000-row baked sample (public/baked-complaints.json) generated by scripts/bake-from-worker.py. The script pulls 2,000 most-recent records per borough through the Cloudflare Worker, normalizes the schema to what the dashboard expects, and writes the file to disk; deploys ship that file as a static asset.

For a given submitted complaint, similarity is scored on borough match (+3), zip match (+4), and complaint-type match (+4), with a recency bonus (+3 for 2024+, +2 for 2023+, +1 for 2022+). Top 5 by score are shown.

The colored badge below each similar pin (and the chip in the side list) maps the close-time:

9. Work-Order Generation Generated

The work-order document on tab 03 is generated by Anthropic Claude Sonnet 4.5 via the /api/generate-work-order Vercel serverless function. The model receives:

The output is rendered on-screen as a styled DOT-format document. If the API is unreachable (e.g. on local python -m http.server which doesn't support POST), the dashboard falls back to a local synthesizer that produces the same JSON shape from the form input.

The "Approve & dispatch" button is disabled until a real work order has been generated; clicking it before then shows a guard message and refuses to advance.

10. Infrastructure Scale Verified

11. Per-Repair Cost Range Estimate

These figures are U.S. market rates for streetlight service calls and electrical labor. They are not NYC DOT contract rates — those are not publicly disclosed.

ComponentRangeSource
NYC licensed electrician labor$50–$200 / hrHomeyou
Bucket truck / aerial lift$60–$150 / hr ($1,500–$2,000 / day)Lightwaytraffic
Simple fixture / bulb repair (non-NYC baseline)$50–$120 totalEcosmart
Full fixture or pole replacement$1,000+Same as above
Annualized per-light maintenance (Los Angeles)~$200 / light / yearCity of LA

NYC-adjusted service-call range used on this site: $200 – $500 per completed complaint.

Rationale: the non-NYC floor of $50–$120 per repair is unrealistic in a union-heavy, high-cost-of-living market. We shift the floor to $200, anchor the midpoint at $350, and cap the ceiling at $500 — well below the $1,000+ full-replacement benchmark. The range is deliberately conservative.

12. Revenue Opportunity — 2025 Estimate

BoroughVolume 2025Low ($200/ea)Mid ($350/ea)High ($500/ea)
Brooklyn8,501$1.70M$2.98M$4.25M
Queens8,255$1.65M$2.89M$4.13M
Bronx6,751$1.35M$2.36M$3.38M
Manhattan6,199$1.24M$2.17M$3.10M
Staten Island3,974$0.79M$1.39M$1.99M
NYC Total33,835$6.77M$11.84M$16.92M

Methodology: annual_revenue = complaints_2025 × rate_per_repair for each rate tier.

Important disclaimer. Revenue estimates use published NYC electrical service-call market rates and 2025 311 complaint volumes. They are not NYC DOT contract data and do not represent actual payments to any current contractor. They do not account for:

13. Data Pipeline & Update Cadence

14. Complete Source List

15. How to Audit or Update These Numbers

  1. Re-pull current Socrata data: run python scripts/bake-from-worker.py 2025 from a machine that can reach the Cloudflare Worker URL. Writes a fresh public/baked-data.json + public/baked-complaints.json.
  2. Re-bake vendor list from CSV: run python scripts/bake-static.py after updating data/vendors_final.csv.
  3. Refine the revenue estimate: if NYC DOT publishes per-unit contract rates (e.g. via a FOIL response), replace the $200–$500 range in the script that computes the hero callouts and rerun.