We present a Party Music Intelligence System (PMIS) —a fully open-source, state‑of‑the‑art hardware/software platform that realizes the described functionality. It combines advanced speech processing, real‑time conversational analysis, AI‑driven playlist curation, and human‑in‑the‑loop refinement, all built on free and open‑source components. The system creates an evolving musical experience where every song requested or mentioned during the party is captured, contextualized, and seamlessly woven into a coherent playback sequence.


  1. System Overview

The PMIS operates in a closed loop:

  1. Input – Partygoers type song names on a keyboard (or speak them).

  2. Audio Capture – An array of laser microphones records all conversations in the room.

  3. Speech‑to‑Text – Conversations are transcribed in real time using a local, open‑source ASR engine.

  4. Music Mention Extraction – An LLM identifies and contextualizes music references (songs, artists, genres, mood cues).

  5. Playlist Curation Engine – All gathered requests and mentions are merged into a dynamic playlist, ordered by semantic coherence (lyrical themes, emotional arc, genre flow).

  6. Playback – The playlist is played through a high‑quality audio system, and an .m3u file is saved for each session.

  7. DJ Interface – A human DJ can review, reorder, and refine playlists, triggering new curation passes.

  8. Database – All tracks, requests, and contextual metadata are stored for future analysis.

All software is open source; all hardware is based on open‑source designs and readily available components.


  1. Hardware Architecture

All hardware is selected for openness, repairability, and performance.

ComponentDescriptionOpen‑Source / Open‑Hardware Details
Laser Microphone Array4–8 units placed on windows or walls to capture room vibrations without intrusive wiring.Based on the OpenLaserMic project (photodiode + laser diode +n op‑amp). Each unit connects via USB‑C to a central hub.
Central Processing UnitA cluster of RISC‑V single‑board computers (e.g., SiFive HiFive Unmatched or StarFive VisionFive 2) to run speech, AI, and playback tasks.Fully open‑source ISA; Linux‑capable; can be clustered for parallel processing.
Audio PlaybackUSB‑attached DAC (e.g., Hifiberry DAC+ DSP) connected to a multichannel amplifier and distributed speakers.Hifiberry provides open‑source drivers; amplifier can be a DIY design from DIYAudio.
Input InterfaceA large touchscreen (e.g., Waveshare 10.1” HDMI LCD) + wireless keyboard.Open‑source kernel drivers; HDMI + USB.
StorageNVMe SSD for music library, database, and model storage.Standard open‑source filesystem support.

All components are housed in a 3D‑printed enclosure (designs open‑sourced).


  1. Software Stack

The system runs a custom Linux distribution (e.g., Debian for RISC‑V) with all software containerized using Podman for easy deployment.

3.1. Core Open‑Source Components

LayerTechnologyRole
OS Linux(RISC‑V port)Base operating system
Speech‑to‑TextWhisper (OpenAI’s open‑source model) with faster‑whisper (optimized inference)Real‑time transcription of conversation streams
Language ModelLlama 3 (or Mistral) – quantized 8‑bit versionMusic mention extraction, conversation summarisation, thematic analysis
Music DatabaseLMS (LMS – open‑source music server) + MusicBrainz Picard for taggingManage local music library; fetch metadata and acoustid fingerprints
Playlist CurationCustom Python service using Librosa (audio features) + sentence‑transformers (lyrics embeddings) + LLM (semantic ordering)Generates coherent playlists from raw requests/mentions
Playlist Format.m3u with extended info (artist, title, duration, mood tags)Saved per session with timestamp
DJ InterfaceWeb‑based dashboard (Flask + Vue.js)Real‑time playlist editing, re‑ranking, manual overrides
Message BusRedis (or NATS)Inter‑process communication between ASR, LLM, curation, and DJ tools

3.2. Data Flow

  1. Audio Capture Each laser microphone streams 16‑bit / 44.1 kHz audio to a dedicated PulseAudio sink.

  2. Conversation Transcription A Whisper service continuously pulls audio chunks (5‑second windows) from the aggregate stream, runs diarization (using pyannote.audio – open source) to separate speakers, and outputs timestamped text.

  3. Music Reference Extraction The transcribed text is fed into a fine‑tuned LLM that: · Extracts song titles, artists, and genres. · Captures context: e.g., “this song reminds me of my first kiss” → emotional theme = “romantic nostalgia”. · Assigns a relevance score. All references are stored in a PostgreSQL database with timestamps and speaker IDs.

  4. Real‑Time Playlist Curation A background process runs every 5 minutes (or on demand): · Collects all new references since last curation. · For each reference, queries the local music library (via MusicBrainz API) to resolve the exact track. · Combines with any manually typed requests (from the keyboard). · Builds a graph of tracks where edges represent “semantic similarity” (using a combination of audio features – danceability, energy – and lyrics embeddings). · Performs a traveling‑salesman‑like ordering to maximise smooth transitions (e.g., songs about “hugs” precede songs about “relationships”). · Outputs a new .m3u file and pushes it to the playback queue.

  5. Playback & Human Refinement · The audio player (mpv or Clementine) monitors the playlist file and plays it in order. · The DJ sees the live playlist on the web dashboard, can drag/drop songs, remove tracks, or request a fresh curation pass with different parameters (e.g., “more upbeat”). · The DJ can also “promote” a mentioned song to the top of the queue.

  6. Session Archiving At the end of the night, a final .m3u playlist is saved as playlist_YYYYMMDD_HHMM.m3u. All raw conversation logs, extracted references, and curation decisions are stored for offline analysis.


  1. Advanced AI Techniques for Coherent Playback

The “coherent playback scheme” is powered by a multi‑stage ranking and ordering pipeline:

· Feature Extraction For each track, we compute: · Acoustic features (Librosa): tempo, key, loudness, timbre. · Lyrics embeddings (if available) via sentence‑transformers. · Metadata: genre, artist, release year, tags. · Semantic Clustering A large language model (e.g., Llama 3 70B quantized) is prompted with:

“You are a music curator. Given the following songs and the emotional context in which they were mentioned (e.g., ‘this song is about a hug between a couple’), group them into a logical sequence that tells a story. Output the ordered list with timestamps."

The LLM considers both the mentioned context and the intrinsic song themes.


  1. Laser Microphone Details

The laser microphone array is the most exotic component, but it is built entirely from open‑source plans and commonly available parts:

The array is placed on windows, glass surfaces, or even a whiteboard to capture voice‑induced vibrations, allowing the system to “hear” conversations without traditional microphones.


  1. Open‑Source Ecosystem Integration

All components are chosen for their open‑source licensing and active community:


  1. System Workflow Example

.

  1. Setup The DJ places laser microphones on windows, boots the central unit, and launches the PMIS software.

  2. Party Start A guest types “Celebration Song” on the keyboard. The song is instantly queued and added to the session’s .m3u.

  3. Conversation Capture Two guests talk: “This song reminds me of that night we danced to ‘Perfect’ by Ed Sheeran.” The system transcribes the sentence, extracts “Perfect” by Ed Sheeran, and notes the context: “romantic memory.”

  4. Real‑Time Curation The curation engine places “Perfect” after the current track, but then adjusts later slots to maintain a thematic arc: love songs → party songs → chill‑out.

  5. DJ Override The DJ sees that too many slow songs are stacking up; he uses the web dashboard to promote an upbeat mention (“Uptown Funk”) to the next slot.

  6. Archival At 2 AM, the system saves playlist_20260328_0200.m3u containing all 120 tracks played, along with a JSON log of every mention and curation decision.


  1. Scalability & Future Enhancements

  1. Conclusion

The Party Music Intelligence System demonstrates that it is possible to build a world‑class, AI‑driven music curation platform using only free and open‑source hardware and software. By combining laser microphone arrays, real‑time speech transcription, large language models, and human‑in‑the‑loop refinement, it creates an immersive, adaptive musical experience that captures the social dynamics of any party. All components are available today, and the entire system can be reproduced, modified, and improved by any enthusiast or researcher.