We present a Party Music Intelligence System (PMIS) —a fully open-source, state‑of‑the‑art hardware/software platform that realizes the described functionality. It combines advanced speech processing, real‑time conversational analysis, AI‑driven playlist curation, and human‑in‑the‑loop refinement, all built on free and open‑source components. The system creates an evolving musical experience where every song requested or mentioned during the party is captured, contextualized, and seamlessly woven into a coherent playback sequence.

System Overview

The PMIS operates in a closed loop:

Input – Partygoers type song names on a keyboard (or speak them).
Audio Capture – An array of laser microphones records all conversations in the room.
Speech‑to‑Text – Conversations are transcribed in real time using a local, open‑source ASR engine.
Music Mention Extraction – An LLM identifies and contextualizes music references (songs, artists, genres, mood cues).
Playlist Curation Engine – All gathered requests and mentions are merged into a dynamic playlist, ordered by semantic coherence (lyrical themes, emotional arc, genre flow).
Playback – The playlist is played through a high‑quality audio system, and an .m3u file is saved for each session.
DJ Interface – A human DJ can review, reorder, and refine playlists, triggering new curation passes.
Database – All tracks, requests, and contextual metadata are stored for future analysis.

All software is open source; all hardware is based on open‑source designs and readily available components.

Hardware Architecture

All hardware is selected for openness, repairability, and performance.

Component	Description	Open‑Source / Open‑Hardware Details
Laser Microphone Array	4–8 units placed on windows or walls to capture room vibrations without intrusive wiring.	Based on the OpenLaserMic project (photodiode + laser diode +n op‑amp). Each unit connects via USB‑C to a central hub.
Central Processing Unit	A cluster of RISC‑V single‑board computers (e.g., SiFive HiFive Unmatched or StarFive VisionFive 2) to run speech, AI, and playback tasks.	Fully open‑source ISA; Linux‑capable; can be clustered for parallel processing.
Audio Playback	USB‑attached DAC (e.g., Hifiberry DAC+ DSP) connected to a multichannel amplifier and distributed speakers.	Hifiberry provides open‑source drivers; amplifier can be a DIY design from DIYAudio.
Input Interface	A large touchscreen (e.g., Waveshare 10.1” HDMI LCD) + wireless keyboard.	Open‑source kernel drivers; HDMI + USB.
Storage	NVMe SSD for music library, database, and model storage.	Standard open‑source filesystem support.

All components are housed in a 3D‑printed enclosure (designs open‑sourced).

Software Stack

The system runs a custom Linux distribution (e.g., Debian for RISC‑V) with all software containerized using Podman for easy deployment.

3.1. Core Open‑Source Components

Layer	Technology	Role
OS Linux	(RISC‑V port)	Base operating system
Speech‑to‑Text	Whisper (OpenAI’s open‑source model) with faster‑whisper (optimized inference)	Real‑time transcription of conversation streams
Language Model	Llama 3 (or Mistral) – quantized 8‑bit version	Music mention extraction, conversation summarisation, thematic analysis
Music Database	LMS (LMS – open‑source music server) + MusicBrainz Picard for tagging	Manage local music library; fetch metadata and acoustid fingerprints
Playlist Curation	Custom Python service using Librosa (audio features) + sentence‑transformers (lyrics embeddings) + LLM (semantic ordering)	Generates coherent playlists from raw requests/mentions
Playlist Format	.m3u with extended info (artist, title, duration, mood tags)	Saved per session with timestamp
DJ Interface	Web‑based dashboard (Flask + Vue.js)	Real‑time playlist editing, re‑ranking, manual overrides
Message Bus	Redis (or NATS)	Inter‑process communication between ASR, LLM, curation, and DJ tools

3.2. Data Flow

Audio Capture Each laser microphone streams 16‑bit / 44.1 kHz audio to a dedicated PulseAudio sink.
Conversation Transcription A Whisper service continuously pulls audio chunks (5‑second windows) from the aggregate stream, runs diarization (using pyannote.audio – open source) to separate speakers, and outputs timestamped text.
Music Reference Extraction The transcribed text is fed into a fine‑tuned LLM that: · Extracts song titles, artists, and genres. · Captures context: e.g., “this song reminds me of my first kiss” → emotional theme = “romantic nostalgia”. · Assigns a relevance score. All references are stored in a PostgreSQL database with timestamps and speaker IDs.
Real‑Time Playlist Curation A background process runs every 5 minutes (or on demand): · Collects all new references since last curation. · For each reference, queries the local music library (via MusicBrainz API) to resolve the exact track. · Combines with any manually typed requests (from the keyboard). · Builds a graph of tracks where edges represent “semantic similarity” (using a combination of audio features – danceability, energy – and lyrics embeddings). · Performs a traveling‑salesman‑like ordering to maximise smooth transitions (e.g., songs about “hugs” precede songs about “relationships”). · Outputs a new .m3u file and pushes it to the playback queue.
Playback & Human Refinement · The audio player (mpv or Clementine) monitors the playlist file and plays it in order. · The DJ sees the live playlist on the web dashboard, can drag/drop songs, remove tracks, or request a fresh curation pass with different parameters (e.g., “more upbeat”). · The DJ can also “promote” a mentioned song to the top of the queue.
Session Archiving At the end of the night, a final .m3u playlist is saved as playlist_YYYYMMDD_HHMM.m3u. All raw conversation logs, extracted references, and curation decisions are stored for offline analysis.

Advanced AI Techniques for Coherent Playback

The “coherent playback scheme” is powered by a multi‑stage ranking and ordering pipeline:

· Feature Extraction For each track, we compute: · Acoustic features (Librosa): tempo, key, loudness, timbre. · Lyrics embeddings (if available) via sentence‑transformers. · Metadata: genre, artist, release year, tags. · Semantic Clustering A large language model (e.g., Llama 3 70B quantized) is prompted with:

“You are a music curator. Given the following songs and the emotional context in which they were mentioned (e.g., ‘this song is about a hug between a couple’), group them into a logical sequence that tells a story. Output the ordered list with timestamps."

The LLM considers both the mentioned context and the intrinsic song themes.

Seamless Transitions The final ordering is smoothed using a dynamic programming algorithm that minimises acoustic discontinuities (key changes, tempo jumps) while respecting the semantic sequence.
Real‑Time Adaptation As new mentions arrive, the system can re‑order upcoming songs without interrupting playback, using a sliding‑window approach.

Laser Microphone Details

The laser microphone array is the most exotic component, but it is built entirely from open‑source plans and commonly available parts:

Each unit consists of a laser diode (650 nm, 5 mW), a photodiode (e.g., BPW34), and a transimpedance amplifier (LM324).
The analog signal is digitized by an ESP32‑S3 (open‑source development board) with an I²S codec (e.g., MAX4466).
The ESP32 streams audio over WiFi to the central server using OpenSoundControl (OSC) or RTP.
The entire design (PCB schematics, 3D‑printed housing) is available under CERN OHL.

The array is placed on windows, glass surfaces, or even a whiteboard to capture voice‑induced vibrations, allowing the system to “hear” conversations without traditional microphones.

Open‑Source Ecosystem Integration

All components are chosen for their open‑source licensing and active community:

Speech‑to‑Text: Whisper (MIT license) – state‑of‑the‑art accuracy.
LLM: Llama 3 (custom commercial license but open weights) or Mistral (Apache 2.0).
Music Metadata: MusicBrainz (open data), AcoustID (open‑source fingerprinting).
Audio Playback: mpv (GPL), Clementine (GPL).
Database: PostgreSQL (PostgreSQL License).
Web Dashboard: Flask (BSD), Vue.js (MIT).
Hardware: RISC‑V (open ISA), ESP32 (open SDK), KiCad for PCB design.

System Workflow Example

Setup The DJ places laser microphones on windows, boots the central unit, and launches the PMIS software.
Party Start A guest types “Celebration Song” on the keyboard. The song is instantly queued and added to the session’s .m3u.
Conversation Capture Two guests talk: “This song reminds me of that night we danced to ‘Perfect’ by Ed Sheeran.” The system transcribes the sentence, extracts “Perfect” by Ed Sheeran, and notes the context: “romantic memory.”
Real‑Time Curation The curation engine places “Perfect” after the current track, but then adjusts later slots to maintain a thematic arc: love songs → party songs → chill‑out.
DJ Override The DJ sees that too many slow songs are stacking up; he uses the web dashboard to promote an upbeat mention (“Uptown Funk”) to the next slot.
Archival At 2 AM, the system saves playlist_20260328_0200.m3u containing all 120 tracks played, along with a JSON log of every mention and curation decision.

Scalability & Future Enhancements

The system can be expanded to multiple rooms using synchronized playback (e.g., Snapcast).
A recommendation engine using collaborative filtering (e.g., Implicit library) can suggest songs not explicitly mentioned but stylistically fitting.
Edge‑optimized versions can run on a single Raspberry Pi 5 (if the LLM is replaced with a smaller model like TinyLlama), while the full RISC‑V cluster handles larger parties.

Conclusion

The Party Music Intelligence System demonstrates that it is possible to build a world‑class, AI‑driven music curation platform using only free and open‑source hardware and software. By combining laser microphone arrays, real‑time speech transcription, large language models, and human‑in‑the‑loop refinement, it creates an immersive, adaptive musical experience that captures the social dynamics of any party. All components are available today, and the entire system can be reproduced, modified, and improved by any enthusiast or researcher.