intelligent-playlist-system
We present a Party Music Intelligence System (PMIS) —a fully open-source, state‑of‑the‑art hardware/software platform that realizes the described functionality. It combines advanced speech processing, real‑time conversational analysis, AI‑driven playlist curation, and human‑in‑the‑loop refinement, all built on free and open‑source components. The system creates an evolving musical experience where every song requested or mentioned during the party is captured, contextualized, and seamlessly woven into a coherent playback sequence.
- Hardware Architecture
All hardware is selected for openness, repairability, and performance.
Component | Description | Open‑Source / Open‑Hardware Details |
Laser Microphone Array | 4–8 units placed on windows or walls to capture room vibrations without intrusive wiring. | Based on the OpenLaserMic project (photodiode + laser diode +n op‑amp). Each unit connects via USB‑C to a central hub. |
Central Processing Unit | A cluster of RISC‑V single‑board computers (e.g., SiFive HiFive Unmatched or StarFive VisionFive 2) to run speech, AI, and playback tasks. | Fully open‑source ISA; Linux‑capable; can be clustered for parallel processing. |
Audio Playback | USB‑attached DAC (e.g., Hifiberry DAC+ DSP) connected to a multichannel amplifier and distributed speakers. | Hifiberry provides open‑source drivers; amplifier can be a DIY design from DIYAudio. |
Input Interface | A large touchscreen (e.g., Waveshare 10.1” HDMI LCD) + wireless keyboard. | Open‑source kernel drivers; HDMI + USB. |
|Storage | NVMe SSD for music library, database, and model storage. | Standard open‑source filesystem support. |
All components are housed in a 3D‑printed enclosure (designs open‑sourced).
- Advanced AI Techniques for Coherent Playback
The “coherent playback scheme” is powered by a multi‑stage ranking and ordering pipeline:
· Feature Extraction For each track, we compute: · Acoustic features (Librosa): tempo, key, loudness, timbre. · Lyrics embeddings (if available) via sentence‑transformers. · Metadata: genre, artist, release year, tags. · Semantic Clustering A large language model (e.g., Llama 3 70B quantized) is prompted with: > “You are a music curator. Given the following songs and the emotional context in which they were mentioned (e.g., ‘this song is about a hug between a couple’), group them into a logical sequence that tells a story. Output the ordered list with timestamps.”
The LLM considers both the mentioned context and the intrinsic song themes. - Seamless Transitions The final ordering is smoothed using a dynamic programming algorithm that minimises acoustic discontinuities (key changes, tempo jumps) while respecting the semantic sequence.
- Real‑Time Adaptation As new mentions arrive, the system can re‑order upcoming songs without interrupting playback, using a sliding‑window approach.
- Open‑Source Ecosystem Integration
All components are chosen for their open‑source licensing and active community:
Speech‑to‑Text: Whisper (MIT license) – state‑of‑the‑art accuracy.
LLM: Llama 3 (custom commercial license but open weights) or Mistral (Apache 2.0).
Music Metadata: MusicBrainz (open data), AcoustID (open‑source fingerprinting).
Audio Playback: mpv (GPL), Clementine (GPL).
Database: PostgreSQL (PostgreSQL License).
Web Dashboard: Flask (BSD), Vue.js (MIT).
Hardware: RISC‑V (open ISA), ESP32 (open SDK), KiCad for PCB design.
- Scalability & Future Enhancements
The system can be expanded to multiple rooms using synchronized playback (e.g., Snapcast).
A recommendation engine using collaborative filtering (e.g., Implicit library) can suggest songs not explicitly mentioned but stylistically fitting.
Edge‑optimized versions can run on a single Raspberry Pi 5 (if the LLM is replaced with a smaller model like TinyLlama), while the full RISC‑V cluster handles larger parties.