多线程 – 并行遍历图形

发布时间：2020-12-14 05:49:34 所属栏目：Java 来源：网络整理

导读：我正在修改考试(仍然)并且遇到了一个让我难过的问题(如下所示).总而言之,我认为问题是“想想any_old_process必须遍历图形并对它找到的对象做一些工作,包括添加更多工作.”我的问题是,什么数据结构可以并行化以实现问题中提出的目标？ The role of a garbage

我正在修改考试(仍然)并且遇到了一个让我难过的问题(如下所示).总而言之,我认为问题是“想想any_old_process必须遍历图形并对它找到的对象做一些工作,包括添加更多工作.”我的问题是,什么数据结构可以并行化以实现问题中提出的目标？

The role of a garbage collector (GC) is to reclaim unused memory.
Tracing collectors must identify all live objects by traversing graphs
of objects induced by aggregation relationships. In brief,the GC has
some work-list of tasks to perform. It repeatedly (a) acquires a task
(e.g. an object to inspect),(b) performs the task (e.g. marks the
object unless it is already marked),and (c) generates further tasks
(e.g. adds the children of an unmarked task to the work-list). It is
desirable to parallelise this operation.

In a single-threaded
environment,the work-list is usually a single LIFO stack. What would
you have to do to make this safe for a parallel GC? Would this be a
sensible design for a parallel GC? Discuss designs of data structure
to support a parallel GC that would scale better. Explain why you
would expect them to scale better.

解决方法

图形的自然数据结构是图形,即可以引用其他元素的一组图形元素(节点).但是,为了更好地缓存重用,可以将元素放置/分配在一个或多个数组(通常是向量)中,以便将相邻元素尽可能地放在内存中.通常,每个元素或一组元素都应该有一个互斥锁(spin_mutex)来保护对它的访问,争用意味着其他一些线程忙于处理它,所以不需要等待.但是,如果可能的话,最好在标志/状态字段上进行原子操作,以便在没有锁定的情况下将元素标记为已访问.例如,最简单的数据结构可以是：

struct object {
    vector<object*> references;
    atomic<bool> is_visited; // for simplicity,or epoch counter
                             // if nothing resets it to false
    void inspect();          // processing method
};
vector<object> objects;      // also for simplicity,if it can be for real
                             // things like `parallel_for` would be perfect here

鉴于此数据结构和GC工作方式的描述,它完全适合像divide-and-conquer pattern这样的递归并行：

void object::inspect() {
    if( ! is_visited.exchange(true) ) {
        for( object* o : objects )   // alternatively it can be `parallel_for` in some variants
            cilk_spawn o->inspect(); // for Cilk or `task_group::run` for TBB or PPL
        // further processing of the object
    }
}

如果问题中的数据结构是任务的组织方式.我推荐一个工作窃取调度程序(如tbb或cilk).关于这个主题有很多论文.简单来说,每个工作线程都有自己但共享的任务副本,当deque为空时,一个线程偷走别人的任务.

可伸缩性来自于每个任务可以添加一些其他可以在并行工作的任务的属性.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!