I Introduction
Many relationships are more than pairwise relations: entities are often grouped into sets, corresponding to adic relationships. Each of these sets can be viewed as a collaboration between entities. Hypergraphs naturally represent adic relations. It has been shown that facets of an information space can be modeled by hypergraphs [1]: each facet corresponds to a type of metadata. The different facets are then linked by reference data attached to hyperedges within that face. The step forward is to highlight important information: it is commonly achieved in hypergraphs using random walks [2, 3]. Reference [3] shows that the weighting of vertices at the level of the hyperedges in a hypergraph allows better information retrieval. These two approaches  [2, 3]  mainly focus on vertices; but as hyperedges are linked to references that can be used as pivots in between the different facets [4, 1], it is also interesting to highlight important hyperedges. For instance, in a document database, different metadata can be used to label authors, author keywords, processed keywords, categories, added tags: the pivots between the different facets of this information space correspond to the documents themselves. In the specific case of tags, it can be important to have weights attached to them if the users are allowed to attach tags to documents.
Hyperedgebased weighting of vertices is easier to achieve through multisets: multisets store information on multiplicity of elements. We use multisets family over a set of vertices, called hyperbag graph  hbgraph for short  as an extension of hypergraphs. Hbgraph multisets play the role of the hyperedges in hypergraph: they are called hbedges. We want to answer the following research questions: “Can we find a network model and a diffusion process that not only rank vertices but also rank hbedges in hbgraphs?”. We develop an iterative exchange approach in hbgraphs with twophase steps that allows to extract information not only at the vertex level but also at the hbedge level.
We validate our approach by using randomly generated hbgraphs. The hbgraph visualisation highlights not only vertices but also hbedges using the exchange process. We show that the exchangebased diffusion process allows proper coloring of vertices with high connectivity and highlights hbedges with a normalisation approach  allowing small hbedges to have a chance to be highlighted.
This paper contributes to present an exchangebased diffusion process that allows not only the ranking of vertices but also of hbedges. It formalizes exchanges by using hbgraphs that can naturally cope for elements multiplicity. It contributes also to a novel visualisation of this kind of network included in each facet of the information space.
In Section II, the related art is listed and the mathematical background is given in Section III. The construction of the formalisation of the exchange process is presented in Section IV. Results and evaluation are given in Section V and future work and conclusion are addressed in Section VI.
Ii Related work
A hypergraph on a finite set of vertices (or vertices) is defined in [5] as a family of hyperedges where each hyperedge is a nonempty subset of and such that . A hypergraph is said edgeweighted if there exists an application .
In a weighted hypergraph the degree of a vertex is defined as:
The volume of is defined as:
The incident matrix of a hypergraph is the matrix of , where .
Random walks are largely used to evaluate the importance of vertices. In [2], a random walk on a hypergraph is defined as choosing a hyperedge
with a probability proportional to
; in a given hyperedge a vertex is uniformly randomly chosen. The probability transition from a vertex to a vertex is:where is the degree of a hyperedge defined in [2] as its cardinality. This random walk has a stationary state which is shown to be for [6]. The process differs from the one we propose: our diffusion process is done by successive steps from a random initial vertex on vertices and hyperedges.
Reference [3]
defines a random walk to allow the use of weighted hypergraph with weight functions both on hyperedges and on vertices: a vector of weights is built for each vertex making weights of vertices hyperedgebased; a random walk similar to the one above is built that takes into account the weight of the vertices. The evaluation is done using a hypergraph built from a public dataset of computer science conference proceedings; each document is seen as a hyperedge that contains keywords; hyperedges are weighted by citation score and vertices of a hyperedge are weighted with a tfidf score. Reference
[3] shows that a random walk on the (double) weighted hypergraph enables vertex ranking with higher precision than with unweighted vertices random walk. The process differs again from our proposal: our process not only enables simultaneous alternative updates of vertices and hbedges values but also allow the ranking of hbedges. We also introduce a new theoretical framework to achieve this.Random walks are related to diffusion processes. [7] use random walks in hypergraph to do image matching. [8] builds higher order random walks in hypergraph and constructs a generalised Laplacian attached to graphs generated by their random walks.
Hypergraphs are used in multifeature indexing to help the retrieval of images [9]. For each image a hyperedge gathers the first
most similar images based on different features. Hyperedges are weighted by average similarity. A spectral clustering algorithm is applied to divide the dataset into
subhypergraphs. A random walk on these subhypergraphs allows to retrieve significant images: they are used to build a new inverted index, useful to query images.Iii Mathematical background
Iiia Multisets
Our definitions on multisets are mainly based on [10]. A multiset  or mset or bag  is a pair where is a set of distinct objects and is an application from to or . is called the universe of the multiset , is called the multiplicity function of the multiset . is called the support of . The elements of the support of an mset are called its generators. A multiset where is called a natural multiset. The mcardinality of written is defined as:
Considering and two msets on the same universe , we define the empty mset, written the set of empty support. is said to be included in  written  if for all : . In this case, is called a submset of . The power multiset of , written , is the multiset of all submsets of Different operations can be defined on multisets of same universe as union, intersection and sum [10].
IiiB Hbgraphs
Hbgraphs are introduced in [ouvrard2018adjacencypart2]. A hbgraph is a family of multisets with same universe and support a subset of . The msets are called the hbedges and the elements of the vertices. We consider for the remainder of the article a hbgraph , with and the family of its hbedges.
Each hbedge has as universe and a multiplicity function associated to it: where . For a general hbgraph, each hbedge has to be seen as a weighted system of vertices, where the weights of each vertex are hbedge dependent.
A hbgraph where the multiplicity range of each hbedge is a subset of is called a natural hbgraph. A hypergraph is a natural hbgraph where the hbedges have multiplicity one for every vertex of their support.
The order of a hbgraph  written  is:
The support hypergraph of a hbgraph is the hypergraph whose vertices are the ones of the hbgraph and whose hyperedges are the support of the hbedges in a onetoone way. We write it , where .
The hbstar of a vertex is the multiset  written  defined as:
The mdegree of a vertex of a hbgraph  written  is defined as:
The degree of a vertex of a hbgraph  written  corresponds to the degree of this vertex in the support hypergraph
The matrix is called the incident matrix of the hbgraph .
A weighted hbgraph is a hbgraph where the hbedges are weighted by . An unweighted hbgraph is then a weighted hbgraph with for all .
A strict mpath in a hbgraph from a vertex to a vertex is a vertex / hbedge alternation with hbedges to and vertices to such that , , and and that for all , .
A strict mpath in a hbgraph corresponds to a unique path in the hbgraph support hypergraph called the support path. In this article we abusively call it a path of the hbgraph. The length of a path corresponds to the number of hbedges it is going through.
Representation of hbgraphs can be achieved either by using submset representation or by using edge representation. In this article we use the extravertex representation of the support hypergraph of the hbgraph: an extravertex is added for each hbedge. Each hbedge is represented by enabling a link in between each vertex of the hbedge support and the hbedge extravertex.
Iv Exchangebased diffusion in hbgraphs
In diffusion processes and random walks an initial vertex is chosen. The diffusion process leads to homogenising the information over the structure. Random walks in hypergraphs rank vertices by the number of times they are reached and this ranking is related to the structure of the network itself. Several random walks with random choices of the starting vertex can be needed to achieve ranking by averaging.
We consider a weighted hbgraph with and ; we write the incident matrix of the hbgraph.
At time we set a distribution of values over the vertex set:
and a distribution of values over the hbedge set:
is the row state vector of the vertices at time and is the row state vector of the hbedges.
We consider an iterative process with twophase steps as illustrated in Figure 1. At every time step: the first phase starts at time and ends at followed by the second phase between time and .
The initialisation sets for every vertex and for every hbedge .
During the first phase between time and , each vertex of the hbgraph shares the value it holds at time with the hbedges it is connected to.
In an unweighted hbgraph, the fraction of given by of mdegree to each hbedge is , which corresponds to the ratio of multiplicity of the vertex due to the hbedge over the total degree of hbedges that contains in their support.
In a weighted hbgraph, each hbedge has a weight . The value of a vertex has to be shared by taking not only the multiplicity of the vertices in the hbedge but also the weight of a hbedge into account.
The weights of the hbedges are stored in a column vector . We also consider the weight diagonal matrix .
We introduce the weighted degree matrix:
where is called the weighted degree of the vertex . It is:
The contribution to the value attached to hbedge of weight from vertex is:
It corresponds to the ratio of weighted multiplicity of the vertex in over the total weighted degree of the hbedges where is in the support.
And the value is calculated by summing over the vertex set:
Hence, we obtain:
(1) 
During the second phase that starts at time , the hbedges share their values between the vertices they hold taking into account the multiplicity of the vertices in the hbedge. Every value is modulated by the weight of the hbedge it comes from.
The contribution to given by a hbedge of weight to the vertex of multiplicity is:
The value is then obtained by:
By writing for and writing the diagonal matrix of size , it comes:
(2) 
(3) 
It is valuable to keep a trace of the intermediate state as it records the importance of the hbedges.
Writing , it follows from 3:
(4) 
V Results and evaluation
This diffusion by exchange process has been validated with two experiments: the first experiment generates a random hbgraph to validate our approach and the second compares the results to a classical random walk on the hbgraph.
We built a random unweighted hbgraph generator. The generator allows to construct singlecomponent hbgraphs or hbgraphs with multiple connected components. A single connected component is built by choosing the number of intermediate vertices that link the different components to ensure that a single component hbgraph is obtained. We generate vertices. We start by building each component and then interconnect them. Let be the number of components. A first set of interconnected vertices is built by choosing vertices out of the . The remaining vertices are then separated into groups. In each of these groups we generate two groups of vertices: a first set of vertices and a second set of vertices with , . The number of hbedges to be built is adjustable and shared between the different groups. The mcardinality of a hbedge is chosen randomly below a maximum tunable threshold. The vertices are considered as important vertices and must be present in a certain amount of hbedges per group; the number of important vertices in a hbedge is randomly fixed below a maximum number. The completion of the hbedge is done by choosing randomly vertices in the set. The random choice made into this two groups is tuned to follow a power law distribution: it implies that some vertices occur more often than others. Interconnection between the components is achieved by choosing vertices in and inserting them randomly into the hbedges built.
We apply our diffusion process on these generated hbgraphs: after a few iterations we visualize the hbgraphs to show the evolution of the vertex value with a gradient coloring scale. We also take advantage of the halfstep to highlight hbedges in the background to show important hbedges with an other gradient coloring scale.
To get proper evaluation and show that vertices with the highest values correspond to vertices that are important in the network  in the way they are central for the connectivity  we compute the eccentricity of vertices from a subset of the vertices to the remaining of the vertices. Eccentricity of a vertex in a graph is the length of a maximal shortest path between this vertex and the other vertices of the graph: extending this definition to hbgraphs is straightforward. If the graph is disconnected then each vertex has infinite eccentricity.
For the purpose of evaluation, in this article, we define a relative eccentricity as the length of a maximal shortest path starting from a given vertex in and ending with any vertices of ; the relative eccentricity is calculated for each vertex of provided that it is connected to vertices of ; otherwise it is set to .
For the vertices set , the subset is built by using a threshold value : vertices with value above this threshold are gathered into a subset of . We consider the set of vertices with values below the threshold. We evaluate the relative eccentricity of each vertex of to vertices of in the support hypergraph of the corresponding hbgraph.
Assuming that we stop iterating at time , we let vary from 0 to the value  obtained by iterating the algorithm on the hbgraph  by incremental steps and until the eccentricity is kept above 0, first of the two achieved. In order to have a ratio we calculate:
where is the reference normalised value, defined as for the hbgraph . This ratio has values increasing by steps from 0 to .
We show the results obtained in Figure 4: we plot two curves. The first plot corresponds to the maximal length of the path between vertices of and vertices of that are connected in function of the value of : the length of the path corresponds to the half of the length of the path observed in the extravertex graph representation of the hbgraph support hypergraph as in between two vertices of there is an extravertex that represent the hbedge (or the support hyperedge). The second curve plots the percentage of vertices that are in over the vertex set in function of . When increases the number of elements in naturally decreases while they are closer to the elements of , marking the fact that they are central.
Figure 5 shows that high values of correspond to vertices that are highly connected either by degree or by mdegree. Hence vertices that are in the positive side of the scale color in Figure 4 correspond to highly connected vertices: the closer to red on the right scale they are, the higher the value of is.
A similar approach is taken for the hbedges: assuming that the diffusion process stops at time , we use the function to partition the set of hbedges into two subsets for a given threshold : of the hbedges that have values above the threshold and the one gathering hbedges that have values below .
varies from 0 to by incremental steps while keeping the eccentricity is kept above 0, first of the two conditions achieved. In the hbgraph representation, each hbedge corresponds to an extravertex. Each time we evaluate the length of the maximal shortest path linking one vertex of to one vertex of that are connected in the hbgraph support hypergraph extravertex graph representation: the length of the path corresponds to the half of the one obtained from the graph for the same reason than before. In Figure 4 we observe for the hbedges the same trend than the one observed for vertices: the length of the maximal path between two hbedges decreases as the ratio increases while the percentage of vertices in over decreases.
Figure 6 shows on the left figure the high correlation between the value of and the cardinality of ; the right figure shows that the correlation between value of and the mcardinality of is even stronger.
The results obtained after five iterations on hbgraphs with different configurations show that we always retrieve the important vertices as the most highlighted. The diffusion by exchange process also highlights additional vertices that were not in the first group but that are at the confluence of different hbedges. The results on the hbedges show that the value obtained is highly correlated to the mcardinality of the hyperedges. To color the hbedges as it is done in Figure 4 we calculate the ratio , where corresponds to the value obtained from the vertices of the hbedge support by giving to each of them the reference value. Hbedges are colored using , the higher its value, the closer to red the color is: we use the left gradient color bar for it.
We have generated random walks on the hbgraphs with random choice of hbedges when the walker is on a vertex with a distribution of probability and a random choice of the vertex when the walker is on a hbedge with a distribution of probability We let the possibility of teleportation to an other vertex from a vertex with a tunable value : represents the probability to be teleported. We choose . We count the number of passage of the walker through each vertex and each hbedge. We stop the random walk when the hbgraph is fully explored. We iterate times the random walk, varying.
Figure 7 shows that after 100 iterations there is weak correlation between the rank obtained by the random walk and our diffusion process. There is no correlation at all with the mdegree of the vertices and the degree of vertices as shown in Figure 8. 100 iterations for the random walk take 6.31 s while it takes 0.009 ms to achieve the 5 iterations needed in the exchangebased approach.
Vi Future work and Conclusion
The results obtained by using hbgraph highlight the possibility of using hbedges for analyzing networks; they confirm that vertices are highlighted due to their connectivity. The highlighting of the hbedges has been achieved by using the intermediate step of our diffusion process: to achieve it conveniently without having a ranking by hbedge mcardinality we normalized it. Different applications can be thought in particular in the search of tagged multimedia documents: sharing of keywords, geographic location, or any valuable information contained in the annotations. Using tagged documents ranking by this means could help in creating summary for visualisation. Our approach is seen as a strong basis to refine the approach of [9].
Acknowledgments
This work is part of the PhD of Xavier OUVRARD, done at UniGe, supervised by Stéphane MARCHANDMAILLET and founded by a doctoral position at CERN, in Collaboration Spotting team, supervised by JeanMarie LE GOFF. A special thanks to André RATTINGER for the exchanges we have daily on our respective PhD.
References
 [1] X. Ouvrard, J.M. Le Goff, and S. MarchandMaillet, “A hypergraph based framework for modelisation and visualisation of high dimension multifacetted data,” Soon on Arxiv, 2018.
 [2] D. Zhou, J. Huang, and B. Schölkopf, “Learning with hypergraphs: Clustering, classification, and embedding,” in Advances in neural information processing systems, pp. 1601–1608, 2007.
 [3] A. Bellaachia and M. AlDhelaan, “Random walks in hypergraph,” in Proceedings of the 2013 International Conference on Applied Mathematics and Computational Methods, Venice Italy, pp. 187–194, 2013.
 [4] M. Dörk, N. H. Riche, G. Ramos, and S. Dumais, “Pivotpaths: Strolling through faceted information spaces,” IEEE Transactions on Visualization and Computer Graphics, vol. 18, no. 12, pp. 2709–2718, 2012.
 [5] C. Berge and E. Minieka, Graphs and hypergraphs, vol. 7. NorthHolland publishing company Amsterdam, 1973.
 [6] A. Ducournau and A. Bretto, “Random walks in directed hypergraphs and application to semisupervised image segmentation,” Computer Vision and Image Understanding, vol. 120, pp. 91–102, 2014.

[7]
J. Lee, M. Cho, and K. M. Lee, “Hypergraph matching via reweighted random
walks,” in
Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on
, pp. 1633–1640, IEEE, 2011.  [8] L. Lu and X. Peng, “Highordered random walks and generalized laplacians on hypergraphs.,” in WAW, pp. 14–25, Springer, 2011.

[9]
Z. Xu, J. Du, L. Ye, and D. Fan, “Multifeature indexing for image retrieval based on hypergraph,” in
Cloud Computing and Intelligence Systems (CCIS), 2016 4th International Conference on, pp. 494–500, IEEE, 2016.  [10] D. Singh, A. Ibrahim, T. Yohanna, and J. Singh, “An overview of the applications of multisets,” Novi Sad Journal of Mathematics, vol. 37, no. 3, pp. 73–92, 2007.
Comments
There are no comments yet.