[fpc-other] fpc and voip.

Wed Feb 8 14:55:03 CET 2017

El 08/02/2017 a las 14:25, Fred van Stappen escribió:

> My knowledge in web-server things are too poor to attack the
> streaming-server part of voip.
>

Hello,

Streaming is different (when well done) than simple http file send. To 
send audio as a regular http transfer the web engine must provide an API 
(callback or alike) call in which you can return a data block, the last 
encoded audio chunk, or finish the transfer.

For a simple job, you just need a thread that is constantly encoding the 
audio input and buffering the encoded audio in a queue. This buffer is 
not unlimited, usually from 1 to 60 seconds. Once you receive an http 
request for data (the http engine callback) you start to send from the 
beginning of the audio queue and take note of client identifier (maybe 
only the port is needed) and the amount of data sent, not all buffer, 
you must send small chunks, probably less than 32K. Once you receive 
another callback request you take the identifier and with it you start 
to sending data from queue beginning + already sent size. Of course the 
queue beginning, at byte zero, has an absolute counter which is 
incremented since the start of compression. If client new position, once 
calculated, if located before the queue start the client is slow than 
encoding speed so you decide, drop connection or restart to send from 
zero offset in the queue, the client will try to resync probably with 
some audio artifacts.

This job involves several threads and careful synchronization, it is not 
a trivial task. The base job can be tested with files, so you not need 
to be really coding audio.

A "good" streaming service is usually coding audio in several queues and 
qualities, start to send in medium quality and jump from queue to queue 
in function of the client bandwidth. In this case you can not use the 
absolute byte position in the stream, but frame compressed position, or 
time position but the choose is encoder dependent. Also in the streaming 
engine you must fully understand how frames are encoded in the stream 
and the engine must send only complete frames (logically encoded frames, 
usually a bunch of audio samples) because the quality change can only be 
performed at frames boundaries for the codecs that support it, old ones 
like mp3 does not support arbitrary quality change, nor at frame 
boundaries (without an audible bleeepssseees).

To better understand the caveats take a look to this wikipedia pages:

https://en.wikipedia.org/wiki/Real-time_Transport_Protocol

https://en.wikipedia.org/wiki/Real_Time_Streaming_Protocol

And a list of common streaming engines.

https://en.wikipedia.org/wiki/Comparison_of_streaming_media_systems#Protocol_support

--