Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer

1.
해외 기술논문을 읽어보면 Ring Buffer를 이용한 기술들이 많습니다. LMAX의 Disruptor도 역시 Ring Buffer를 이용하여 Multi-Threading Application을 구현하고 있습니다. 오늘 소개하는 논문도 Ring Buffer와 관련한 자료입니다.

논문을 알게된 계기는 저자가 대표로 있는 회사의 블로그입니다.Linux Journal에 실린 글입니다. 제목은 “How to Scale the Work Queue in a Multicore Environment”입니다. 전문을 함께 올립니다.

My Article In Linux Journal

2.
아울러 논문에서 소개한 소스도 함께소개합니다.

/**
 * Implementation of Naive and Lock-free ring-buffer queues and
 * performance and verification tests.
 *
 * Build with (g++ version must be >= 4.5.0):
 * $ g++ -Wall -std=c++0x -O2 -D DCACHE1_LINESIZE=`getconf LEVEL1_DCACHE_LINESIZE` lockfree_rb_q.cc -lpthread
 *
 * I verified the program with g++ 4.5.3, 4.6.1 and 4.6.3.
 *
 * Copyright (C) 2012-2013 Alexander Krizhanovsky (ak@natsys-lab.com).
 *
 * This file is free software; you can redistribute it and/or modify
 * it under the terms of the GNU Lesser General Public License as published
 * by the Free Software Foundation; either version 3, or (at your option)
 * any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU Lesser General Public License for more details.
 * See http://www.gnu.org/licenses/lgpl.html .
 */
#ifndef __x86_64__
#warning "The program is developed for x86-64 architecture only."
#endif
#if !defined(DCACHE1_LINESIZE) || !DCACHE1_LINESIZE
#ifdef DCACHE1_LINESIZE
#undef DCACHE1_LINESIZE
#endif
#define ____cacheline_aligned	__attribute__((aligned(64)))
#endif
#define ____cacheline_aligned	__attribute__((aligned(DCACHE1_LINESIZE)))

#include <limits.h>
#include <malloc.h>
#include <string.h>
#include <unistd.h>

#include <atomic>
#include <cassert>
#include <iostream>
#include <condition_variable>
#include <mutex>
#include <thread>

#define QUEUE_SIZE	(32 * 1024)

/*
 * ------------------------------------------------------------------------
 * Naive serialized ring buffer queue
 * ------------------------------------------------------------------------
 */
template<class T, unsigned long Q_SIZE = QUEUE_SIZE>
class NaiveQueue {
private:
	static const unsigned long Q_MASK = Q_SIZE - 1;

public:
	NaiveQueue()
		: head_(0), tail_(0)
	{
		ptr_array_ = (T **)::memalign(getpagesize(),
				Q_SIZE * sizeof(void *));
		assert(ptr_array_);
	}

	void
	push(T *x)
	{
		std::unique_lock<std::mutex> lock(mtx_);

		cond_overflow_.wait(lock, [&head_, &tail_]() {
					return tail_ + Q_SIZE > head_;
				});

		ptr_array_[head_++ & Q_MASK] = x;

		cond_empty_.notify_one();
	}

	T *
	pop()
	{
		std::unique_lock<std::mutex> lock(mtx_);

		cond_empty_.wait(lock, [&head_, &tail_]() {
					return tail_ < head_;
				});

		T *x = ptr_array_[tail_++ & Q_MASK];

		cond_overflow_.notify_one();

		return x;
	}

private:
	unsigned long		head_, tail_;
	std::condition_variable	cond_empty_;
	std::condition_variable	cond_overflow_;
	std::mutex		mtx_;
	T			**ptr_array_;
};


/*
 * ------------------------------------------------------------------------
 * Lock-free N-producers M-consumers ring-buffer queue.
 * ABA problem safe.
 *
 * This implementation is bit complicated, so possibly it has sense to use
 * classic list-based queues. See:
 * 1. D.Fober, Y.Orlarey, S.Letz, "Lock-Free Techniques for Concurrent
 *    Access to Shared Ojects"
 * 2. M.M.Michael, M.L.Scott, "Simple, Fast and Practical Non-Blocking and
 *    Blocking Concurrent Queue Algorithms"
 * 3. E.Ladan-Mozes, N.Shavit, "An Optimistic Approach to Lock-Free FIFO Queues"
 *
 * See also implementation of N-producers M-consumers FIFO and
 * 1-producer 1-consumer ring-buffer from Tim Blechmann:
 *	http://tim.klingt.org/boost_lockfree/
 *	git://tim.klingt.org/boost_lockfree.git
 * 
 * See See Intel 64 and IA-32 Architectures Software Developer's Manual,
 * Volume 3, Chapter 8.2 Memory Ordering for x86 memory ordering guarantees.
 * ------------------------------------------------------------------------
 */
// thread_local is still not implemented in GCC.
static size_t __thread __thr_id;

/**
 * @return continous thread IDs starting from 0 as opposed to pthread_self().
 */
inline size_t
thr_id()
{
	return __thr_id;
}

inline void
set_thr_id(size_t id)
{
	__thr_id = id;
}

template<class T,
	decltype(thr_id) ThrId = thr_id,
	unsigned long Q_SIZE = QUEUE_SIZE>
class LockFreeQueue {
private:
	static const unsigned long Q_MASK = Q_SIZE - 1;

	struct ThrPos {
		unsigned long head, tail;
	};

public:
	LockFreeQueue(size_t n_producers, size_t n_consumers)
		: n_producers_(n_producers),
		n_consumers_(n_consumers),
		head_(0),
		tail_(0),
		last_head_(0),
		last_tail_(0)
	{
		auto n = std::max(n_consumers_, n_producers_);
		thr_p_ = (ThrPos *)::memalign(getpagesize(), sizeof(ThrPos) * n);
		assert(thr_p_);
		// Set per thread tail and head to ULONG_MAX.
		::memset((void *)thr_p_, 0xFF, sizeof(ThrPos) * n);

		ptr_array_ = (T **)::memalign(getpagesize(),
				Q_SIZE * sizeof(void *));
		assert(ptr_array_);
	}

	~LockFreeQueue()
	{
		::free(ptr_array_);
		::free(thr_p_);
	}

	ThrPos&
	thr_pos() const
	{
		assert(ThrId() < std::max(n_consumers_, n_producers_));
		return thr_p_[ThrId()];
	}

	void
	push(T *ptr)
	{
		/*
		 * Request next place to push.
		 *
		 * Second assignemnt is atomic only for head shift, so there is
		 * a time window in which thr_p_[tid].head = ULONG_MAX, and
		 * head could be shifted significantly by other threads,
		 * so pop() will set last_head_ to head.
		 * After that thr_p_[tid].head is setted to old head value
		 * (which is stored in local CPU register) and written by @ptr.
		 *
		 * First assignment guaranties that pop() sees values for
		 * head and thr_p_[tid].head not greater that they will be
		 * after the second assignment with head shift.
		 *
		 * Loads and stores are not reordered with locked instructions,
		 * se we don't need a memory barrier here.
		 */
		thr_pos().head = head_;
		thr_pos().head = __sync_fetch_and_add(&head_, 1);

		/*
		 * We do not know when a consumer uses the pop()'ed pointer,
		 * se we can not overwrite it and have to wait the lowest tail.
		 */
		while (__builtin_expect(thr_pos().head >= last_tail_ + Q_SIZE, 0))
		{
			::sched_yield();

			auto min = tail_;

			// Update the last_tail_.
			for (size_t i = 0; i < n_consumers_; ++i) {
				auto tmp_t = thr_p_[i].tail;

				// Force compiler to use tmp_h exactly once.
				asm volatile("" ::: "memory");

				if (tmp_t < min)
					min = tmp_t;
			}
			last_tail_ = min;
		}

		ptr_array_[thr_pos().head & Q_MASK] = ptr;

		// Allow consumers eat the item.
		thr_pos().head = ULONG_MAX;
	}

	T *
	pop()
	{
		/*
		 * Request next place from which to pop.
		 * See comments for push().
		 *
		 * Loads and stores are not reordered with locked instructions,
		 * se we don't need a memory barrier here.
		 */
		thr_pos().tail = tail_;
		thr_pos().tail = __sync_fetch_and_add(&tail_, 1);

		/*
		 * tid'th place in ptr_array_ is reserved by the thread -
		 * this place shall never be rewritten by push() and
		 * last_tail_ at push() is a guarantee.
		 * last_head_ guaraties that no any consumer eats the item
		 * before producer reserved the position writes to it.
		 */
		while (__builtin_expect(thr_pos().tail >= last_head_, 0))
		{
			::sched_yield();

			auto min = head_;

			// Update the last_head_.
			for (size_t i = 0; i < n_producers_; ++i) {
				auto tmp_h = thr_p_[i].head;

				// Force compiler to use tmp_h exactly once.
				asm volatile("" ::: "memory");

				if (tmp_h < min)
					min = tmp_h;
			}
			last_head_ = min;
		}

		T *ret = ptr_array_[thr_pos().tail & Q_MASK];
		// Allow producers rewrite the slot.
		thr_pos().tail = ULONG_MAX;
		return ret;
	}

private:
	/*
	 * The most hot members are cacheline aligned to avoid
	 * False Sharing.
	 */

	const size_t n_producers_, n_consumers_;
	// currently free position (next to insert)
	unsigned long	head_ ____cacheline_aligned;
	// current tail, next to pop
	unsigned long	tail_ ____cacheline_aligned;
	// last not-processed producer's pointer
	unsigned long	last_head_ ____cacheline_aligned;
	// last not-processed consumer's pointer
	unsigned long	last_tail_ ____cacheline_aligned;
	ThrPos		*thr_p_;
	T		**ptr_array_;
};


/*
 * ------------------------------------------------------------------------
 *	Tests for naive and lock-free queues
 * ------------------------------------------------------------------------
 */
static const auto N = QUEUE_SIZE * 1024;
static const auto CONSUMERS = 16;
static const auto PRODUCERS = 16;

typedef unsigned char	q_type;

static const q_type X_EMPTY = 0; // the address skipped by producers
static const q_type X_MISSED = 255; // the address skipped by consumers
q_type x[N * PRODUCERS];
std::atomic<int> n(0);

template<class Q>
struct Worker {
	Worker(Q *q, size_t id = 0)
		: q_(q),
		thr_id_(id)
	{}

	Q *q_;
	size_t thr_id_;
};

template<class Q>
struct Producer : public Worker<Q> {
	Producer(Q *q, size_t id)
		: Worker<Q>(q, id)
	{}

	void operator()()
	{
		set_thr_id(Worker<Q>::thr_id_);

		for (auto i = thr_id(); i < N * PRODUCERS; i += PRODUCERS) {
			x[i] = X_MISSED;
			Worker<Q>::q_->push(x + i);
		}
	}
};

template<class Q>
struct Consumer : public Worker<Q> {
	Consumer(Q *q, size_t id)
		: Worker<Q>(q, id)
	{}

	void operator()()
	{
		set_thr_id(Worker<Q>::thr_id_);

		while (n.fetch_add(1) < N * PRODUCERS) {
			q_type *v = Worker<Q>::q_->pop();
			assert(v);
			assert(*v == X_MISSED);
			*v = (q_type)(thr_id() + 1); // don't write zero
		}
	}
};

template<class Q>
void
run_test(Q &&q)
{
	std::thread thr[PRODUCERS + CONSUMERS];

	n.store(0);
	::memset(x, X_EMPTY, N * sizeof(q_type) * PRODUCERS);

	// Run producers.
	for (auto i = 0; i < PRODUCERS; ++i)
		thr[i] = std::thread(Producer<Q>(&q, i));

	::usleep(10 * 1000); // sleep to wait the queue is full

	/*
	 * Run consumers.
	 * Create consumers with the same thread IDs as producers.
	 * The IDs are used for queue head and tail indexing only,
	 * so we  care only about different IDs for threads of the same type.
	 */
	for (auto i = 0; i < CONSUMERS; ++i)
		thr[PRODUCERS + i] = std::thread(Consumer<Q>(&q, i));

	// Wait for all threads completion.
	for (auto i = 0; i < PRODUCERS + CONSUMERS; ++i)
		thr[i].join();

	// Check data.
	auto res = 0;
	std::cout << "check X data..." << std::endl;
	for (auto i = 0; i < N * PRODUCERS; ++i) {
		if (x[i] == X_EMPTY) {
			std::cout << "empty " << i << std::endl;
			res = 1;
			break;
		} else if (x[i] == X_MISSED) {
			std::cout << "missed " << i << std::endl;
			res = 2;
			break;
		}
	}
	std::cout << (res ? "FAILED" : "Passed") << std::endl;
}

int
main()
{
	LockFreeQueue<q_type> lf_q(PRODUCERS, CONSUMERS);
	run_test<LockFreeQueue<q_type>>(std::move(lf_q));

	NaiveQueue<q_type> n_q;
	run_test<NaiveQueue<q_type>>(std::move(n_q));

	return 0;
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

/**

* Implementation of Naive and Lock-free ring-buffer queues and

* performance and verification tests.

* Build with (g++ version must be >= 4.5.0):

* $ g++ -Wall -std=c++0x -O2 -D DCACHE1_LINESIZE=`getconf LEVEL1_DCACHE_LINESIZE` lockfree_rb_q.cc -lpthread

* I verified the program with g++ 4.5.3, 4.6.1 and 4.6.3.

* This file is free software; you can redistribute it and/or modify

* it under the terms of the GNU Lesser General Public License as published

* by the Free Software Foundation; either version 3, or (at your option)

* any later version.

* This program is distributed in the hope that it will be useful,

* but WITHOUT ANY WARRANTY; without even the implied warranty of

* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

* GNU Lesser General Public License for more details.

* See http://www.gnu.org/licenses/lgpl.html .

#ifndef __x86_64__

#warning "The program is developed for x86-64 architecture only."

#endif

#if !defined(DCACHE1_LINESIZE) || !DCACHE1_LINESIZE

#ifdef DCACHE1_LINESIZE

#undef DCACHE1_LINESIZE

#endif

#define ____cacheline_aligned __attribute__((aligned(64)))

#endif

#define ____cacheline_aligned __attribute__((aligned(DCACHE1_LINESIZE)))

#include <limits.h>

#include <malloc.h>

#include <string.h>

#include <unistd.h>

#include <atomic>

#include <cassert>

#include <iostream>

#include <condition_variable>

#include <mutex>

#include <thread>

#define QUEUE_SIZE (32 * 1024)

* ------------------------------------------------------------------------

* Naive serialized ring buffer queue

* ------------------------------------------------------------------------

template<class T, unsigned long Q_SIZE = QUEUE_SIZE>

class NaiveQueue {

private:

static const unsigned long Q_MASK = Q_SIZE - 1;

public:

NaiveQueue()

: head_(0), tail_(0)

{

ptr_array_ = (T **)::memalign(getpagesize(),

Q_SIZE * sizeof(void *));

assert(ptr_array_);

}

void

push(T *x)

{

std::unique_lock<std::mutex> lock(mtx_);

cond_overflow_.wait(lock, [&head_, &tail_]() {

return tail_ + Q_SIZE > head_;

});

ptr_array_[head_++ & Q_MASK] = x;

cond_empty_.notify_one();

}

T *

pop()

{

std::unique_lock<std::mutex> lock(mtx_);

cond_empty_.wait(lock, [&head_, &tail_]() {

return tail_ < head_;

});

T *x = ptr_array_[tail_++ & Q_MASK];

cond_overflow_.notify_one();

return x;

}

private:

unsigned long head_, tail_;

std::condition_variable cond_empty_;

std::condition_variable cond_overflow_;

std::mutex mtx_;

T **ptr_array_;

};

* ------------------------------------------------------------------------

* Lock-free N-producers M-consumers ring-buffer queue.

* ABA problem safe.

* This implementation is bit complicated, so possibly it has sense to use

* classic list-based queues. See:

* 1. D.Fober, Y.Orlarey, S.Letz, "Lock-Free Techniques for Concurrent

* Access to Shared Ojects"

* 2. M.M.Michael, M.L.Scott, "Simple, Fast and Practical Non-Blocking and

* Blocking Concurrent Queue Algorithms"

* 3. E.Ladan-Mozes, N.Shavit, "An Optimistic Approach to Lock-Free FIFO Queues"

* See also implementation of N-producers M-consumers FIFO and

* 1-producer 1-consumer ring-buffer from Tim Blechmann:

* http://tim.klingt.org/boost_lockfree/

* git://tim.klingt.org/boost_lockfree.git

* See See Intel 64 and IA-32 Architectures Software Developer's Manual,

* Volume 3, Chapter 8.2 Memory Ordering for x86 memory ordering guarantees.

* ------------------------------------------------------------------------

// thread_local is still not implemented in GCC.

static size_t __thread __thr_id;

/**

* @return continous thread IDs starting from 0 as opposed to pthread_self().

inline size_t

thr_id()

{

return __thr_id;

}

inline void

set_thr_id(size_t id)

{

__thr_id = id;

}

template<class T,

decltype(thr_id) ThrId = thr_id,

unsigned long Q_SIZE = QUEUE_SIZE>

class LockFreeQueue {

private:

static const unsigned long Q_MASK = Q_SIZE - 1;

struct ThrPos {

unsigned long head, tail;

};

public:

LockFreeQueue(size_t n_producers, size_t n_consumers)

: n_producers_(n_producers),

n_consumers_(n_consumers),

head_(0),

tail_(0),

last_head_(0),

last_tail_(0)

{

auto n = std::max(n_consumers_, n_producers_);

thr_p_ = (ThrPos *)::memalign(getpagesize(), sizeof(ThrPos) * n);

assert(thr_p_);

// Set per thread tail and head to ULONG_MAX.

::memset((void *)thr_p_, 0xFF, sizeof(ThrPos) * n);

ptr_array_ = (T **)::memalign(getpagesize(),

Q_SIZE * sizeof(void *));

assert(ptr_array_);

}

~LockFreeQueue()

{

::free(ptr_array_);

::free(thr_p_);

}

ThrPos&

thr_pos() const

{

assert(ThrId() < std::max(n_consumers_, n_producers_));

return thr_p_[ThrId()];

}

void

push(T *ptr)

{

* Request next place to push.

* Second assignemnt is atomic only for head shift, so there is

* a time window in which thr_p_[tid].head = ULONG_MAX, and

* head could be shifted significantly by other threads,

* so pop() will set last_head_ to head.

* After that thr_p_[tid].head is setted to old head value

* (which is stored in local CPU register) and written by @ptr.

* First assignment guaranties that pop() sees values for

* head and thr_p_[tid].head not greater that they will be

* after the second assignment with head shift.

* Loads and stores are not reordered with locked instructions,

* se we don't need a memory barrier here.

thr_pos().head = head_;

thr_pos().head = __sync_fetch_and_add(&head_, 1);

* We do not know when a consumer uses the pop()'ed pointer,

* se we can not overwrite it and have to wait the lowest tail.

while (__builtin_expect(thr_pos().head >= last_tail_ + Q_SIZE, 0))

{

::sched_yield();

auto min = tail_;

// Update the last_tail_.

for (size_t i = 0; i < n_consumers_; ++i) {

auto tmp_t = thr_p_[i].tail;

// Force compiler to use tmp_h exactly once.

asm volatile("" ::: "memory");

if (tmp_t < min)

min = tmp_t;

}

last_tail_ = min;

}

ptr_array_[thr_pos().head & Q_MASK] = ptr;

// Allow consumers eat the item.

thr_pos().head = ULONG_MAX;

}

T *

pop()

{

* Request next place from which to pop.

* See comments for push().

* Loads and stores are not reordered with locked instructions,

* se we don't need a memory barrier here.

thr_pos().tail = tail_;

thr_pos().tail = __sync_fetch_and_add(&tail_, 1);

* tid'th place in ptr_array_ is reserved by the thread -

* this place shall never be rewritten by push() and

* last_tail_ at push() is a guarantee.

* last_head_ guaraties that no any consumer eats the item

* before producer reserved the position writes to it.

while (__builtin_expect(thr_pos().tail >= last_head_, 0))

{

::sched_yield();

auto min = head_;

// Update the last_head_.

for (size_t i = 0; i < n_producers_; ++i) {

auto tmp_h = thr_p_[i].head;

// Force compiler to use tmp_h exactly once.

asm volatile("" ::: "memory");

if (tmp_h < min)

min = tmp_h;

}

last_head_ = min;

}

T *ret = ptr_array_[thr_pos().tail & Q_MASK];

// Allow producers rewrite the slot.

thr_pos().tail = ULONG_MAX;

return ret;

}

private:

* The most hot members are cacheline aligned to avoid

* False Sharing.

const size_t n_producers_, n_consumers_;

// currently free position (next to insert)

unsigned long head_ ____cacheline_aligned;

// current tail, next to pop

unsigned long tail_ ____cacheline_aligned;

// last not-processed producer's pointer

unsigned long last_head_ ____cacheline_aligned;

// last not-processed consumer's pointer

unsigned long last_tail_ ____cacheline_aligned;

ThrPos *thr_p_;

T **ptr_array_;

};

* ------------------------------------------------------------------------

* Tests for naive and lock-free queues

* ------------------------------------------------------------------------

static const auto N = QUEUE_SIZE * 1024;

static const auto CONSUMERS = 16;

static const auto PRODUCERS = 16;

typedef unsigned char q_type;

static const q_type X_EMPTY = 0; // the address skipped by producers

static const q_type X_MISSED = 255; // the address skipped by consumers

q_type x[N * PRODUCERS];

std::atomic<int> n(0);

template<class Q>

struct Worker {

Worker(Q *q, size_t id = 0)

: q_(q),

thr_id_(id)

{}

Q *q_;

size_t thr_id_;

};

template<class Q>

struct Producer : public Worker<Q> {

Producer(Q *q, size_t id)

: Worker<Q>(q, id)

{}

void operator()()

{

set_thr_id(Worker<Q>::thr_id_);

for (auto i = thr_id(); i < N * PRODUCERS; i += PRODUCERS) {

x[i] = X_MISSED;

Worker<Q>::q_->push(x + i);

}

};

template<class Q>

struct Consumer : public Worker<Q> {

Consumer(Q *q, size_t id)

: Worker<Q>(q, id)

{}

void operator()()

{

set_thr_id(Worker<Q>::thr_id_);

while (n.fetch_add(1) < N * PRODUCERS) {

q_type *v = Worker<Q>::q_->pop();

assert(v);

assert(*v == X_MISSED);

*v = (q_type)(thr_id() + 1); // don't write zero

}

};

template<class Q>

void

run_test(Q &&q)

{

std::thread thr[PRODUCERS + CONSUMERS];

n.store(0);

::memset(x, X_EMPTY, N * sizeof(q_type) * PRODUCERS);

// Run producers.

for (auto i = 0; i < PRODUCERS; ++i)

thr[i] = std::thread(Producer<Q>(&q, i));

::usleep(10 * 1000); // sleep to wait the queue is full

* Run consumers.

* Create consumers with the same thread IDs as producers.

* The IDs are used for queue head and tail indexing only,

* so we care only about different IDs for threads of the same type.

for (auto i = 0; i < CONSUMERS; ++i)

thr[PRODUCERS + i] = std::thread(Consumer<Q>(&q, i));

// Wait for all threads completion.

for (auto i = 0; i < PRODUCERS + CONSUMERS; ++i)

thr[i].join();

// Check data.

auto res = 0;

std::cout << "check X data..." << std::endl;

for (auto i = 0; i < N * PRODUCERS; ++i) {

if (x[i] == X_EMPTY) {

std::cout << "empty " << i << std::endl;

res = 1;

break;

} else if (x[i] == X_MISSED) {

std::cout << "missed " << i << std::endl;

res = 2;

break;

}

std::cout << (res ? "FAILED" : "Passed") << std::endl;

}

int

main()

{

LockFreeQueue<q_type> lf_q(PRODUCERS, CONSUMERS);

run_test<LockFreeQueue<q_type>>(std::move(lf_q));

NaiveQueue<q_type> n_q;

run_test<NaiveQueue<q_type>>(std::move(n_q));

return 0;

}

2 Comments

TheLim 8월 18, 2021 at 4:42 오후

해당 소스 RunTest를 해보려 하니 오류가 발생합니다.
운영체제 : Window 10
개발환경 : Visual Studio 2015

#include 해당 부분이 Linux 운영체제가 아니라 SourceForge같은 곳에 있는 대체코드로 넣는데 aligned나 DCACHE1_LINESIZE 부분에 선언이 되지않았다고 뜨네요.
혹시 Window 버전의 소스도 공유가 가능할까요?

Reply ↓
1. smallake (Post author)8월 19, 2021 at 1:04 오후
  
  제가 개발한 소스가 아닙니다. URL에서 원문을 참조하시길 바랍니다. 그리고 MPMC를 위한 라이브러리는 무척 많습니다. 2013년자료임을 참조하시길 바랍니다.
  
  Reply ↓

Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer

이 글 공유하기:

2 Comments

Leave a Comment 응답 취소